MCTUBRARIES 


3  9080  02451  2342 


OORR  59-2 


Office  of  Ordnance  Research 


PROCEEDINGS  OF  THE  FOURTH  CONFERENCE 
ON  THE  DESIGN  OF  EXPERIMENTS  IN  ARMY  RESEARCH 
DEVELOPMENT  AND  TESTING 


OFFICE  OF  ORDNANCE  RESEARCH.  U.S.  ARMY 
BOX  CM,  DUKE  STATION 
DURHAM,  NORTH  CAROLINA 


t  Available  Copy 


20040419  040 


OORR  59-2 


These  pages  were  blank  in  the  original  document. 


14 

132 

222 

16 

134 

228 

18 

136 

230 

20 

160 

232 

22 

164 

234 

24 

166 

236 

30 

170 

238 

48 

190 

240 

56 

192 

248 

58 

194 

258 

60 

196 

260 

90 

198 

262 

92 

200 

264 

94 

202 

266 

96 

204 

268 

98 

206 

272 

100 

208 

284 

102 

210 

286 

104 

212 

288 

106 

214 

0  lUo  iu’-y^ '>  l^'l<h^  'iTi^uiAMJ 

(yi^o  ^lyyT-c-i^  hfiy  lA^^- 

{so^j  oo  7  -  /y/ 6 


Best  Available  Copy 


OFFICE  OF  ORDNANCE  RESEARCH 
Report  No.  59-2 
August  1959 


PROCEEDINGS  OF  THE  FOURTH_CpNFERENCE 
ON  THE  DESIGN  OF  EXPERIMENTS  IN  ARMY  RESEARCH 
DEVELOPMENT  AND  TESTING 


Sponsored  by  the  Army  Mathematics  Steering  Committee 

conducted  at 

The  Quartermaster  Research  and  Engineering  Center 
Natick,  Massachusetts 
22-24  October  1958  . 


DISTRIBUTION  STATEMENT  F: 

Further  dissemination  only  as  directed  by  /‘i  6^ 


or  higher  DoD  authority. 


OFFICE  OF  ORDNANCE  RESEARCH,  U.S.  ARMY 
BOX  CM,  DUKE  STATION 
DURHAM,  NORTH  CAROLINA 


TABLE  OF  CONTENTS 


Page 

i 


» 


Foreward 

Program  .  ,  iii 

Welcome  to  Fourth  Conference  on  Design  of  Expeidments 
in  Army  Research,  Development,  and  Testing 

By  Dr.  J.  Fred  Oesterling  . . .  ’  vii 

Errors  of  the  Third  Rind  in  Statistical  Consulting 

By  Dr.  A.  W,  Kimball . .  1 

The  AASHO  Road  Test  as  an  Exarqsle  of  Large  Scale  Tests 

By  Professor  Carl  F,  Kossack  . . . . .  n 

Multiple  Correlation  of  Mechanical  with  Ballistic 
Properties  of  Armor  Plate 

By  Olga  Sipes  . . . . .  31 

Analysis  of  Cathode  Interface  Resistance  Equipment 

By  M.  H.  Zinn  . .  k9 

Experimental  Designs  for  Bio-Assay  with  Pathogens  * 

By  Ira  A.  DeArmon,  Jr, 

The  Application  of  Experimental  Designs  to  Radar 
Systems  Data 

By  E.  Biser,  Harvey  ELsenberg,  and  George  Millman  7^ 

Effects  of  Ballistic  and  Meteorological  Variations  on 
the  Accuracy  of  Artillery  Fire 

By  0,  P,  Bruno  . . .  12? 

Characteristics  of  Various  Methods  for  Collecting 
Sensitivity  Data 

By  A,  Bulfinch . .  .  137 

Establishing  and  Testing  Criteria  for  Trajectory 
Smoothing 

By  Paul  C.  Cox . . . * .  1^7 

Problems  of  Analysis,  (l)  Individual  Variability 
(2)  Interaction  Effects 
By  A.  M,  Galligan 


^  This  paper  was  presented  at  the  Conference.  It  is  not  published  in 
these  proceedings. 


TABLE  OF  CONTENTS  (Cont'd)  Page 

Evaluation  by  Indirect  Means  of  Effects  of  Bacteria  on 
an  Unchallenged  Host 

By  Morris  A.  Rhian  . . .  I6l 

Deteimination  of  Performance  Criteria  for  Quarteiroaster 
Corps  Ponctions 

By  John  £,  Sterrett  .  16? 

Program  for  the  Interlaboratory  Determination  of 
Conpression  Set  of  Elastomers  at  Low  Temperatures 

By  S*  L.  Eisler  . . .  . .  •  171 

An  Appraisal  of  Sequential  Analysis  Under  Conditions 
Restricted  by  the  Requirements  for  Advanced  Scheduling 
and  Programming 

By  E*  Vr«  Larson  and  W,  D.  Foster  173 

Siitplified  Conputational  Procedures  for  Estimating 
Parameters  of  a  Normal  Distribution  from  Restricted 
Sasples 

By  Professor  A*  C«  Cohen  . . .  177 

Statistical  Problems  Associated  with  Missile  Testing 

By  Dr«  Charles  L,  Carroll,  Jr,  . . .  21^ 

Application  of  Sequential  Type  Design  and  Analysis  to 
Field  Tests 

By  Harold  R,  Rush  .  223 

A  Discourse  on  a  Sequential  Observational  Program  Used 
in  a  Study  of  a  Response  Surface  for  a  Complex  Weapons 
System 

By  William  J,  . . .  2iil 

Some  Statistical  Aspects  of  Preference  and  Related  Tests 

By  C,  I,  Bliss .  2h9 


Statistical  Methods  Applied  to  the  Textile  Industry 
By  L,  H,  C,  Tippett  . . . 


273 


FOREWORD 


The  Army  Mathematics  Steering  Committee  (AMSC)  at  its  19^8  April  meet¬ 
ing  accepted  the  invitation,  issued  by  Dr,  John  K.  Sterrett  on  behalf  of 
the  Quartermaster  Corps,  to  hold  the  Fourth  Conference  bn  the  Design  of;.  Ex¬ 
periments  in  Army  Research,  Development  and  Testing  at  the  Oiarteimaster 
Research  and  Engineering  Center  at  Natick,  Massachusetts,  This  meeting, 
held  22-2k  October  1958,  was  the  first  in  this  series  of  Army-wide  confer¬ 
ences  to  be  conducted  outside  the  Washington,  D,  C,  area.  Through  these 
synposia  the  AMSG  hopes  to  introduce  and  encourage  the  use  of  the  latest 
statistical  and  design  techniques  into  the  research,  development,  and  test¬ 
ing  conducted  by  Army  scientific  and  engineering  personnel.  It  is  believed 
that  this  purpose  can  be  pursued  best  by  holding  these  meetings  at  various 
government  installations  through  the  country. 

The  five  invited  speakers  at  the  Fourth  Design  Conference  were  C,  I, 
Bliss,  A.  C,  Cohen,  A,  W,  Kimball,  C,  F,  Kossack,  and  L,  H,  C,  Tippett, 
Various  aspects  of  preference  studies,  information  on  restricted  sanples, 
errors  of  the  third  kinds,  and  the  American  Association  of  State  Highway 
Officials  road  test  were  the  topics  discussed  by  the  first  four  of  these 
men.  The  fifth  speaker,  L,  H,  C,  Tippett,  of  the  Shirley  Institute,  Man¬ 
chester,  England,  talked  to  the  group  on  some  of  the  statistical  methods 
now  being  applied  in  the  textile  industry.  In  addition  to  these  addresses 
there  were  nine  papers  presented  in  the  Clinical  Sessions  and  eight  in  the 
Technical  Sessions,  Characteristics  of  sensitivity  data,  trajectory  smooth¬ 
ing,  performance  criteria,  and  advanced  scheduling  were  a  few  of  the  topics 
that  came  up  for  discussion  in  the  Clinical  Sessions,  The  papers  presented 
in  the  Technical  Sessions  covered  a  wide  range  of  topics j  examples  of  prob¬ 
lems  dealt  with  included  interface  resistance  in  cathode  tubes,  properlies 
of  armor  plate,  bio-assay  with  pathogens,  field  tests,  radar  systems,  and 
conplex  weapon  systems. 

The  Fourth  Conference  was  attended  by  9i;  registrants  and  participants 
from  organizations.  Speakers  and  panelists  came  from  Boston  University, 
Connecticut  Agricultural  Ebperiment  Station,  Harvard  University,  Oak  Ridge 
National  Laboratory,  Princeton  University,  Purdue  University,  RCA  Service 
Conpany,  Shirley  Institute,  University  of  Georgia,  University  of  Ifi.chigan, 
Virginia  Polytechnic  Institute,  and  11  Army  facilities.  The  present  volume 
is  the  Proceedings  of  this  conference,  and  it  contains  19  of  the  21  presented 
papers.  The  papers  are  being  made  available  in  the  present  form  in  oi*der 
to  encourage  wider  use  of  modem  statistical  principles  of  the  design  of 
experiments  in  research,  development,  and  testing  work  of  concern  to  the 
Amy, 


The  members  of  the  Army  Mathematics  Steering  Committee  take  this  oppor- 
tunigy  to  ejpress  their  thanks  to  the  many  speakers  and  other  research 
workers  who  participated  in  the  meetingj  to  Major  General  C,  G,  Calloway, 
Commanding  General  of  the  Quartermaster  Research  and  Engineering  Center  at 
Natick,  for  maki ng  available  the  excellent  facilities  of  his  organization 
for  the  Conference!  and  to  Mr,  J,  Schaller  who  handled  1he  details  of  the 
local  arrangements  for  the  Conference,- 


ii 


Finally,  the  Chairman  wishes  to  e35)ress  his  appreciation  to  his  Ad¬ 
visory  Ccfflimittee,  W,  G,  Cochran,  F,  Q*  Dressel  (Secretary),  Churchill  ELser 
hart,  Landis  Qephart,  Frank  Grubbs,  Clifford  Maloney,  and  J,  K.  Sterrett 
for  their  help  in  organizing  the  conference* 


Si  S.  mks 

Professor  of  Mathematics 
Princeton  University 


iii 


FOURTH  COHFHffiNCE  CJH  THE  DESIGN  OF  EXPERIMENTS  IN  ARM!  RESEARCH 

DEVELOPMENT  AND  TESTING 
22-24  October  1958 

Quartermaster  Research  and  Engineering  Center 
22  October  19 58 

REGISTRATION:  0930  -  1000  {Eastern  Daylight  Saving  Time) 

Lobby  of  the  Administration  Building 

MORNING  SESSION;  1000  -  1215  -  Auditorium  in  the  Administration  Building 

Chairman:  Colonel  George  F.  Leist,  Ordnance  Corps 
Commanding  Officer  of  the  Office  of 
Ordnance  Research 

Introductory  Remarks:  Dr.  J.  Fred  Oesterling,  Acting 
Scientific  Director,  Quartermaster  Research 
and  Eiagineering  Center,  Quartermaster  Research 
and  Engineering  Command 

Errors  of  the  Third  Kind  in  Statistical  Consulting 
Dr.  A.  W.  Kimball,  Oak  Ridge  National  Laboratory 

The  AASHO  Road  Test  as  an  Exangole  of  Large  Scale  Tests 
Professor  Carl  F.  Kossach,  Purdue  University 

LUNCH;  1215  -  1345  -  Cafeteria  in  the  Administration  Building 

There  vill  be  tvo  Technical  Sessions  conducted  Wednesday  eifternoon. 

The  security  classification  of  Session  II  is  CONFIDENTIAL.  No  clearances 
vill  be  required  for  Session  I. 

TECHNICAL  SESSION  I:  1345  *•  I600  -  Conference  Room  in  the  Field  House 

Chairman:  Walter  Pressman,  U-  S.  Army  Signal  Research 
and  Development  Laboratory 

Multiple  Correlation  of  Mechanical  vith  Ballistic 
.  Properties  of  Armor  Plate 

Olga  Sipes,  Frankford  Arsenal 

AnaJLysis  of  Cathode  Interface  Resistance  Equipment 
M.  H.  Zinn,  U.  S.  Army  Signal  Research  and 
Development  Laboratory 

Experimental  Designs  for  Bio -Assay  vith  Pathogens 

Ira  A.  DeArmon,  Jr.,  Biological  Warfare  Laboratories, 

U.  S.  Army  Chemical  Corps 

TECHNICAL  SESSION  II:  1345  -  I6OO  -  Room  200  in  the  Development  Building 

Security  Classification  -  CONFIDENTIAL 
Chairman:  L.  F.  Nichols,  Picatinny  Arsenal 


The  Application  of  Experimental  Designs  to  Radar 
Systems  Data 

E.  Biser,  Harvey  Blsenberg,  and  George  Millman, 
Systems  Division,  Surveill^ce  Department,  U.  S. 
Army  Signal  R  &  D  laboratory 

Effects  of  Ballistic  and  Meteorological  Variations  on 
the  Accuracy  of  Artillery  Fire  ■ 

0*  P.  Bruno,  Weapon  Systems  Laboratory 

SOCIAL  HODR;  I63O  -  1730  -  Conference  Room  in  the  Field  House 

23  October  19^8 

Clinical  Sessions  A  and  B  will  run  concurrently  on  Thursday  morning. 
The  General  Session  Thursday  afternoon  will  be  followed  by  Technical 
Sessions  IH  and  IV.  No  clearances  are  required  for  any  of  the  papers 
in  the  Thursday  sessions. 

CLINICAL  SESSION  A;  0930  -  1215  -  Room  200  in  the  Development  Building 

Chairman:  Joseph  Weinstein,  U.  S.  Army  Signal 
Research  and  Development  Laboratory 

Panel  Members:  A.  C.  Cohen,  Jr.,  University  of  Georgia 
A.  Golub,  Weapon  Systems  Laboratory 
F.  E.  Grubbs,  Weapon  Systems  Laboratory 
Boyd  Harshbarger,  Virginia  Polytechnic 
3jistitute 

Characteristics  of  Various  Methods  for  Collecting 
Sensitivity  Data 
A.  Bulfitich,  Picatinny  Arsenal 

Causes  of  Excess  Dispersion  in,  and  Optimum  Components 
'  for,  20  mm  HEI  Accuracy  Firing 

Benjamin  Shratter,  Lake  City  Arsenal 

Establishing  and  Testing  Criteria  for  Trajectory 
Smoothing 

Patil  C.  Cox,  Reliability  and  Statistics  Office, 
Ordnance  Mission,  White  Sands  Missile  Range 

Problems  of  Analysis,  (l)  Individual  Variability 
(2)  Interaction  Effects 

A.  M.  Galligan,  Quartermaster  Research  and 
Engineering  Center 

CLINICAL  SESSION  B;  0930  -  1215  -  Conference  Room  in  the  Field  House 

Chainnant  D.  H.  K.  Lee,  Quartermaster  Research  and 
Engineering  Center 


V 


nT.TWICAL  SESSION  B  (Cont'd): 


Panel  Members;  G.  E.  P.  Box,  Princeton  University 
W.  G.  Cochran,  Harvard  University 
G,  E*  Noether,  Boston  University 
L.  H.  C,  Tippett,  Shirley  Institute 

Evaluation  by  Indirect  Means  of  Effects  of  Bacteria  on 
an  Unchallenged  Ifost 

Morris  A.  Rhian,  Biological  Warfare  Laboratories, 

U.  S.  Army  Chanical  Corps 

Deteimination  of  Performance  Criteria  for  (Juartermaster 
Corps  Functions 

John  K.  Sterrett,  Research  and  Engineering  Division, 
Office  of  the  Quartermaster  General 

Program  for  the  Interlaboratory  Determination  of 
Con^iression  Set  of  Elastomers  at  Low  Ten^jeratures 
S.  L.  Eisler,  Rock  Island  Arsenal 

An  Appraisal  of  Sequential  Analysis  Under  Conditions 
Restricted  by  the  Requirements  for  Advanced  Scheduling 
and  Programming 

E.  W»  Larson  and  W.  D.  Foster,  Biological  Warfare 
Laboratories,  U.  S.  Army  Chemical  Corps 

LUNCH;  121^  -  131^  -  Cafeteria  in  the  Administration  Building 

GWBRKL  SESSION;  1315  -  1500  -  Auditorium  in  the  Administration  Building 

Chairman;  Dr.  John  E.  Sterrett 

Office  of  the  Quartermaster  General 

Simplified  Cbnqmtational  Procedures  for  Estimating 
Parameters  of  a  Normal  Distribution  from  Restricted 
Samples 

Professor  A.  C.  Cohen,  University  of  Georgia 

Statistical  Problems  Associated  with  Missile  Testing 
Dr.  Charles  L.  Carroll,  Jr.,  RCA  Service  Company 

TECHNICAL  SESSION  III;  l5l5  -  l600  -  Room  200  in  the  Development  Building 

Chairman;  Ernest  M.  Kenyon 

Quartermaster  Research  and  Engineering  Center 

Application  of  Sequential  T^e  Design  and  Analysis  to 
Field  Tests 

Harold  R,  Rush,  Quartermaster  Field  Evaluation  Agency, 
Quartermaster  R  &  D  Command 

TECHNICAL  SESSION  IV;  I5l5  -  1600  -  Conference  Room  in  the  Field  House 


vi 


TECHNICAL  SESSION  IV  (Cont«d)i 

Chairman;  P.  J.  Loatman,  WatervUet  Arsenal 

A  Discourse  on  a  Sequential  Observational  Program  TJset 
in  a  Study  of  a  Response  Surface  for  a  Complex  Weapon! 
System 

William  J.  Wrobleski,  The  University  of  Michigan 
2h  October  1958 

Two  invited  speakers  are  scheduled  to  address  the  group  on  Friday 
morning.  Right  after  the  noon  meal  your  host  for  iia.s  conference  will 
conduct  a  tour  of  their  installation. 

GMERAL  STgfWTCttl;  0930  -  1200  -  Auditorium  in  the  Administration  Buildii 

Chalnnan:  Dr.  Clifford  J.  Maloney,  Chemical  Corps 
Research  and  Developnent  Command 

Some  Statistical  Aspects  of  Preference  Studies 
C.  I.  Bliss,  Connecticut  Agricultural  Experiment 
Station 

Statistical  Methods  Applied  to  the  Textile  Industry 
L,  H.  C.  Tippett,  Shirley  Institute,  Manchester, 
England 

HjNCH;  1200  -  1330  -  Cafeteria  in  the  Administration  Buildinj 

TOURS;  1330  -  15OO  -  Lobby  of  the  Administration  Building 

Tours  of  the  recently  dedicated  Solar  Furnace,  the 
Climatic  Chamibers,  and  special  interest  areas  of  the 
Chemicals  and  Plastics,  Environmental  Protection 
Research,  Mechanical  Engineering,  Pioneering  Research 
Textile,  Clothing  and  Footwear  Divisions  will  be 
arranged  for  Friday  afternoon.  The  number  and  extent 
of  the  tours  will  be  dependent  on  liie  time  available. 


VffiLCOME  TO  FOURTH  CONFERENCE  ON  DESIGN  OF  EXPERIMENTS 
IN  ARMY  RESEARCH,  DEVELOREHT,  AND  TESTING 

J,  Fred  Oesterling 

Headquarters  Quartermaster  Research  and  Development  Command 

Evolution  in  technical  capability  resembles  biological  evolution.  In 
the  last  analysis,  significant  advances  on  the  broad  scale  are  dependent 
upon  numerous  small  advances  often  occurring  independently  and  sometimes 
almost  in  random  fashion.  But  progress  historically  appears  to  proceed  by 
saltation,  rather  than  in  the  steady  fashion  that  multiple  independent 
events  might  be  expected  to  produce.  It  seems  that  there  is  a  certain  de¬ 
pendence  of  discoveries  on  each  other,  or  an  interaction  between  discover¬ 
ies  independently  made,  idiich  leads  at  times  to  vigorous  upsurge  in  total 
effect,  and  occasionally  to  spectacular  results. 

It  is  very  difficult  to  determine  the  historical  significance  of 
events  while  they  are  actually  taking  place,  since  one’s  field  of  vision 
scarcely  exceeds  the  probable  error  of  the  eventsj  but. one  might  be  par¬ 
doned,  I  think,  for  feeling  that  we  now  stand  in  the  presence  of  one  of 
these  upsurges.  The  classical  scientific  procedure  has  been,  and  probably 
must  remain,  predominantly  analytical.  In  the  past,  analytical  procedures 
have  largely  had  a  deterministic  base.  One  has  tried  to  arrange  events  so 
that  only  one  variable  is  at  work,  and  this  by  controlled  and  precisely 
determined  intervals.  The  limitations  of  this  procedure  were  readily 
apparent,  but  methods  for  circumven'ta.ng  its  limitations  have  been  slow 
in  coming.  The  txirn  of  the  century  saw  a  rapid  development  in  the  basic 
probabilistic  handling  of  data,  and  permitted  the  use  of  probabilistic 
experimental  design.  Today  we  see  an  expanding  application  of  the  pro¬ 
babilistic  outlook  to  a  wide  variety  of  problems,  including  those  of 
"naturally  "  occurring  and  uncontrolled  events.  No  method  can  abstract 
from  the  data  information  vhich  they  do  not  con  tain  j  but  there  has  been 
vast  improvement  in  wringing  from  given  data  the  maximum  of  information 
that  they  do  contain.  Where  it  is  not  possible  to  control  variables  to 
the  extent  that  might  be  desired,  or  to  make  a  fresh  collection  of  better 
data,  the  information  contained  in  the  available  observations  can  be 
largely  abstracted  and  used  as  guides  until  better  data  are  available, 

The  degree  of  reliability  to  be  placed  upon  the  emergent  information  is 
probably  as  important  a  contribution  as  the  information  itself  in  furnish¬ 
ing  guidance. 

We  are  well  aware  that  the  subject  you  are  meetii^  to  discuss  over  the 
next  toee  days,  in  spite  of  the  esoteric  titles  of  some  papers,  is  metho¬ 
dologically  of  very  great  importance  to  the  Quartermaster  R&E  Command, 
Insofar  as  we  can,  we  do  design  and  conduct  highly  controlled  experiments! 
but  so  much  of  the  QH  operations  is  not  susceptible  to  this  type  of  examina¬ 
tion,  and  the  application  of  experimental  results  to  actual  operation  is 
often  far  from  a  straightforward  matter.  Dr.  Sterrett  represents  the  spear¬ 
head  of  a  modem  mathematical  attack  upon  our  problems  within  the  Qi  RSE 
structure.  This  will  be  extended,  and  we  look  forward  to  considerably 
better  solutions  to  our  numerous  problems  whether  research,  developmental, 
or  operational.  To  this  objective  your  deliberations  caimot  fail  to  make 
a  material  contribution. 


ERRORS  OF  THE  THIRD  KIND  IN  STATISTICAL  CONSULTING* 


A.  W.  Kimball 

Oak  Ridge  National  Laboratory 

Because  graduate  students  in  statistics  are  given  little,  if  any, 
preparation  for  actual  consulting,  they  are  prone,  particularly  in 
their  early  years,  to  commit  errors  of  the  third  kind,  many  of  which 
could  be  avoided  if  the  students  were  properly  trained.  Errors  of 
the  third  kind  are  defined  and  are  illustrated  with  actual  examples 
from  consulting  experience.  The  cases  used  represent  types  of  error 
irtiich  resxilt  from  different  situations  that  arise  frequently  in 
practice.  Some  discussion  is  included  of  possible  remedies  for 
this  problem  that  are  suggested  by  the  experience  of  educators  in 
other  fields . 


INTRODUCTION 

At  a  relatively  early  age  in  their  gradxiate  academic  3±fe,  students  of 
statistics  become  familiar  with  certain  risks  associated  with  what  they  come 
to  know  as  the  first  and  second  kinds  of  error  in  the  theory  of  testing  hypo¬ 
theses.  They  soon  learn  that  in  many  widely  used  statistical  tests  the  first 
kind  of  error  is  easy  to  control  but  that  often  the  risk  of  the  second  kind  of 
error  is  difficult  to  compute  and  more  often  neglected  entirely  in  practice. 

The  lTi5»rtance  of  these  errors  is  constantly  brought  to  their  attention  through 
emphasis  in  their  course  work  on  such  things  as  uniformly  most  powerful  tests 
and  sequential  procedures  which  control  the  risks  of  both  kinds  of  error.  More 
recently  the  theory  of  decision  making,  the  natural  sequel  to  hypothesis  test¬ 
ing,  has  elevated  the  notion  of  risk  to  an  even  higher  place  in  the  hierarchy 
of  ideas  passed  on  from  professor  to  student. 

As  a  result  of  these  teachings  many  of  today's  statistics  graduates 
come  away  from  the  warm  comfort  of  university  complacence  into  the  coldly 
realistic  outside  world  imbued  with  the  idea  (and  probably  rightly  so) 
that  the  statistician's  only  real  function  in  this  world  is  to  compute 
risks  of  error  for  other  people  who  have  to  make  decisions.  To  be  sure, 
there  is  a  vast  araovint  of  planning  (design  of  experiments,  model  building) 
cuid  intermediate  adjustment  (missing  data,  extreme  observations)  necessary 
before  the  statistician  can  estimate  these  risks,  but  essentially  this  is 
his  main  task,  and  the  student  finds  it  out  usually  before  the  end  of  the 
first  semester. 

Consider  then  the  embryo  statistician  who  has  been  released  from  the 
university's  uterus  with  a  shiny  new  degree  and  who  proceeds  on  his  mission 
as  a  risk  computer  fully  equipped  with  the  tools  of  his  trade  and  the  mental 
wherewithal  to  apply  them.  Let  us  assume  that  during  the  first  few  years  of 
his  initiation  as  a  consulting  statistician  he  is  lucky,  from  a  mathematical 
statistics  point  of  view,  and  computes  correctly  the  risks  of  error  for  all 
problems  he  tackles.  The  chances  are,  speaking  nonina thematically,  that  dur¬ 
ing  this  time  he  will  commit  the  third  kind  of  error  more  often  than  he  or 
anyone  else  realizes.  "What  is  even  more  tragic  is  that,  although  as  a  student 

*This  paper  was  originally  published  in  the  Journal  of  the  American 
Statistical  Association,  vol.  $2,  no.  278  (June,  19^1 ) .  Permission  to 
reproduce  it  here  is  greatly  appreciated  by  the  editors. 


2 


Design  of*  Experimer 


he  was  constantl7  reminded  of  the  importance  of  the  first  two  kinds  of  error  ar 
dulj  vovred  always  to  keep  sight  of  them,  he  was  probably  never  made  aware  of  tfc 
existence  of  a  third  kind  of  error,  let  alone  told  what  to  do  about  it. 

The  purpose  of  this  paper  is  to  draw  attention  to  the  third  kind  of  error 
by  quoting  actual  examples  in  which  the  error  was  made  and  later  rectified.  The 
hope  is  that  the  paper  will  serve  simultaneously  as  a  warning  and  as  a  modera¬ 
tor  for  newly  trained  consultants  who  tend  to  descend  on  research  workers  with 
the  sometimes  frightening  enthusiasm  and  confidence  of  a  freshman  at  his  first 
football  practice,  and  that  perhaps  it  will  help  stimulate  responsible  educa¬ 
tors  to  move  more  rapidly  in  filling  this  wide  gap  in  graduate  statistics  train 
ing.  Most  conscientious  teachers  of  statistics  recognize  this  need  and  are 
searching  for  effective  methods  of  correcting  the  situation,  but  very  little 
real  progress  has  been  made. 

In  this  connection  there  is  an  interesting  analogy  between  graduate  statis 
cal  training  and  medical  training.  The  physician  of  today,  after  he  ccmipletes 
internship  and  residency,  is  well  trained  to  practice  medicine  but  not  so  well 
trained  to  do  research.  This  fact  is  recognized  by  many  schools  in  which  the 
M,  D,  who  wants  to  do  research  in  physiology  is  advised  to  get  a  Ph.D,  in  this 
field  after  he  completes  medical  school.  The  emphasis  in  medical  school  is  on 
practice  since  most  medical  graduates  never  see  the  inside  of  a  research  labora 
tory.  The  graduate  statistician,  on  the  other  hand,  is  for  the  most  part  well 
trained  to  "go  into  practice,"  that  is,  to  do  statistical  consulting.  A  safe 
guess  is  that  over  half  of  the  graduates  in  statistics  each  year  are  lured  into 
industry  or  government  where  their  principal  work  is  consulting,  and  those  who 
do  go  to  universities  frequently  find  their  nonteaching  time  fully  occupied  wit] 
consulting  both  on  and  off  campus.  It  is  of  utmost  importance,  therefore,  that 
the  third  kind  of  error  in  statistical  consulting  be  emphasized  and  brought  out 
into  the  open.  Otherwise  nothing  may  ever  be  done  about  it, 

THE  ERROR  OF  THE  THIRD  KIND 

A  simple  and  almost  ludicrous  definition  of  the  error  of  the  third  kind  is 
the  error  ccmraitted  by  giving  the  right  answer  to  the  wrong  problem.  In  defining 
it  this  way  we  are  allowing  the  statistician  the  benefit  of  the  doubt  by  reject¬ 
ing  the  possibility  that  he  wotild  give  the  wrong  answer  to  the  wrong  question. 
We  are  also  protecting  ourselves  against  the  occurrence  of  a  false  positive,  thi 
is,  the  situation  in  which  the  wrong  answer  to  the  wrong  problem  turns  out  to  b< 
the  right  answer  to  the  right  problem.  At  this  point  the  reader  who  finished 
the  introduction  without  succumbing  to  the  temptation  to  look  ahead  for  a  defin: 
tion  may  well  feel  like  the  reader  of  a  murder  mystery  who  on  the  last  page  dis¬ 
covers  that  the  victim  committed  suicide.  Why,  he  may  ask,  should  we  concern 
ourselves  with  any  consulting  statistician  who  could  be  stupid  enough  to  ccanmit 
such  an  error?  Admittedly,  there  may  be  many  mature  statisticians  who  prefer  t( 
take  this  attitude  rather  than  face  the  consequences  of  accepting  its  alternati'' 
If  this  is  so,  the  situation  is  indeed  a  grave  one. 

There  is  no  way  of  knowing  how  many  of  us,  particularly  in  our  early  years 
as  consultants,  were  guilty  of  errors  of  the  third  kind,  but  it  is  almost  cer¬ 
tain  that  few  have  escaped  an  occasional  mistake  of  this  na,ture.  The  reason 
is  simple  enough.  Many  of  us,  in  good  faith,  have  helped  research  workers 
make  t-tests,  or  compute  analyses  of  variance,  or  design  experiments  thinking 


'  Design  of  Experiments 


3 


we  were  giving  the  right  answer  to  the  right  problem;  and  usually  we  do  give  the 
right  answer  to  the  question  that  is  asked.  Unfortunately  it  often  happens  that 
the  question  asked  has  little  bearing  on  the  real  problem,  and  we  are  led  into 
committing  the  third  kind  of  error. 

A  stranger  to  the  intimacies  of  statistical  consulting  might  well  doubt  that 
such  ridiculous  events  could  ever  occur,  but  the  experienced  statistician  knows 
that  they  do  occur  and  will  probably  never  be  completely  eliminated.  Basically, 
errors  of  the  third  kind  are  caused  by  inadequate  communication  between  the  con¬ 
sultant  and  the  research  worker.  In  some  instances,  the  research  worfcer  is  at 
fault  for  failing  to  discuss  his  problem  in  complete  perspective.  He  may  feel 
that  the  statistician  is  weak  in  the  subject  matter  field  and  that  any  attempt 
at  a  complete  explanation  would  be  a  waste  of  tiniej  or  ha  may  not  have  his  ideas 
completely  crystallized  and  may  not  want  to  be  •‘conftised*'  by  a  mathematician; 
or  he  may  know  a  little  statistics  and  feel  that  he  can  state  the  question  ade¬ 
quately  himself;  or  he  may  sinply  nob  want  to  take  up., too  much  of  ihe  consioltant's 
time.  At  the  same  time  the  statistician  is  at  fault  for  not  beccxning  sufficiently 
familiar  with  the  problem  to  enable  him  to  advise  intelligently.  With  proper  pre¬ 
paration,  sufficient  patience,  and  persistait  questioning  of  the  experimenter, 
the  consultant  should  be  able  to  avoid  most  errors  of  tiie  third  kind,  but  not 
until  he  recognizes  that  they  exist.  In  the  next  section  an  attempt  is  made  to 
show  that  such  errors  can  happen  and  under  circumstances  that  ordinarily  would 
not  be  regarded  as  unusual  or  bizarre. 

FX'SMPIES  OF  ERRORS  OF  THE  THIRD  KH® 

The  material  for  these  exanples  is  drawn  for  the  most  part  from  the 
author's  own  experience,  with  the  natural  result  that  most  of  the  problems 
come  from  the  field  of  biology.  The  main  theme  of  the  paper,  however,  is 
not  biologiceO.  and  except  for  weakness  in  the  subject  matter  field,  either 
on  the  part  of  the  author  or  the  reader,  the  message  should  be  clear.  It 
should  not  be  inferred  that  the  errors  ill-ustrated  are  necessarily  those  of 
the  author,  althoiigh  hfe  would  not  deny  this  possibility. 

Example  I.  An  engineer  was  engaged  in  particle  size  determinations  in  con¬ 
nection  with  corrosion  studies.  He  wanted  to  estimate  the  particle  size  dis¬ 
tribution,  which  he  was  willing  to  assume  normal,  but  his  method  prevented  him 
from  observing  particle  sizes  below  a  certain  diameter.  He  knew  very  little 
about  statistics  but  he  had  heard  that  there  were  ways  of  estimating  distri¬ 
butions  when  samples  are  restricted.  ThOTe  was  no  statistician  in  his  own 
gx'oup  to  \diom  he  could  turn  for  help,  but  there  was  one  nearby  who,  although 
very  busy,  might  give  him  a  reference. 

So  he  visited  the  statistician  and  presented  him  viith  the  following  sample 
of  particle  sizes:  2^.6,  7.1,  $.1,  U.2,  3.7,  3.0,  2.6,  2.0,  1.8,  1.6,  1.5,  l.U, 
1.3,  1.2,  1.1,  1,0,  0.9,  0.8,  0.7  -  and  pointed  out  that  his  msthod  would  not 
allow  him  to  determine  particle  sizes  less  than  0.7 .  Assuming  the  distribution 
normal,  he  wanted  to  know  how  he  could  estimate  its  mean  and  variance.  The 
statistician  was  indeed  quite  busy  and  not  inclined  to  spend  much  time  on  a 
problem  he  knew  very  little  about  and  which  did  not  originate  in  his  group. 

On  the  other  hand  he  did  not  want  to  cause  any  ill  feelings  by  refusinf'  to 
give  any  help  at  all.  An  easy  way  out  was  simply  to  hand  the  engineer  one  of 
his  many  reprints  on  truncated  normal  distributions  (after  all  the  engineer 
had  asked  for  a  reference),  and  this  he  did.  Both  participants  in  this  short 


u 


Design  of  Eiqjeriiiie 


conference  went  away  happy,  the  engineer  because  he  thought  he  had  an  answer  to 
his  problem  and  the  statistician  because  he  disposed  of  an  unin  teres  tipg  proble 
in  short  order.  But,  as  any  reader  who  carefully  inspected  the  "san5)le«  of  par 
tide  sizes  already  knows,  an  errcxr  of  the  third  kind  was  committed.  It  might 
easily  have  gone  xmnoticed  indefinitely,  as  do  many  others,  but  forttmately  thi 
error  was  caught. 

The  engineer  returned  to  his  desk  armed  confidently  with  the  newly  acquire' 
reprint  and  began  to  apply  the  method  with  the  help  of  his  193^  model  calculate: 
He  had  not  gotten  veiy  far  along  before  he  found  that  one  of  the  statistics  he 
coaqmted  was  far.  outside  the  range  of  a  koy  table  given  in  the  reprint  to  faci¬ 
litate  solution  of  the  equations.  After  checking  for  and  finding  no  arithmeti¬ 
cal  inaccuracies,  he  reluctantly  returned  to  the  statistician  who  inwardly  was 
not  too  happy  to  see  the  engineer  back.  This  conference  lasted  longer  than  the 
first,  and  with  great  chagrin  the  statistician  finally  realized  what  a  stupid 
blunder  had  been  made. 

Among  the  methods  used  in  particle  size  detennination  is  one  known  as  the 
sedimentation  method.  Briefly,  it  consists  of  the  preparation  of  a  liquid  sus¬ 
pension  of  the  material  to  be  analyzed  and  the  measurement  of  the  decrease  in 
concentration  of  particles  at  or  above  a  particular  level  in  the  suspension  as 
sedimentation  proceeds.  Under  suitable  conditions.  Stoke 's  law  can  be  used  to 
compute  the  percentage  of  particles  in  the  suspension  having  diameters  greater 
than  d.  say,  where  the  value  of_jLis  determined  by  the  time  elapsed  after  sedi¬ 
mentation  starts.  Thus  the  random  variables  are  the  percentages,  and is  a 
fixed  or  independent  variate.  It  was  this  technique  that  the  engineer  had  used. 
The  appropriate  method  of  estimation  is,  of  course,  probit  analysis  or  one  of  ii 
counterparts,  and  the  ♦’truncation"  is  not  a  problem  except  insofar  as  it  increa? 
the  errors  of  estimate. 

If  the  statistician  had  been  familiar  with  particle  size  methods,  or  even  i 
he  had  carefully  scrutinized  the  "sample'’  that  was  presented  to  him,  the  error 
could  never  have  occurred.  It  might  be  argued  that  both  parties  to  this  near¬ 
fiasco  were  the  victims  of  circumstance  and  not  really  responsible,  but  if  we 
are  honest  we  must  admit  that  the  statistician  has  a  duty  to  be  more  careful 
in  avoiding  this  kind  of  errpr  than  perhaps  any  other.  If  he  commits  an  error 
of  the  third  kind,  he  is  no  less  at  fault  than  the  pl^ician  who  inadvertently 
administers  arsenic  instead  of  asperin. 

•Ryflwipl  rt  II.  A  geneticist  working  in  the  field  of  radiation  biology  became  in- 
ter^ted  ih  the  relative  biological  effects  of  different  kinds  of  radiation. 

In  one  experiment  he  hoped  to  contpare  the  effects  of  gamma  radiation  and  neutror 
radiation  by  exposing  two  groups  of  organisms  separately  to  graded  doses  of  each 
kind  of  radiation  and  then  determining  the  frequency  of  mutations  at  each  dose. 
In  previous  experiments  it  had  been  found  that  mutation  frequencies  increase 
linearly  with  dose,  so  he  planned  to  evaluate  the  relative  biological  effect 
by  a  conqparison  of  the  two  slopes  fof  the  two  kinds  of  radiation. 

After  the  experiment  was  completed,  he  visited  a  newly  hooded  statistician 
and  adeed  him  to  estimate  the  two  slopes  and  make  a  statistical  test  of  the 
difference  between  them.  He  explained  that  the  gamma  source  used  in  the  ex¬ 
periment  was  radioactive  cobalt  which  provided  an  essentially  pure  source  of 
gamma  rays,  but  that  the  neutron  experiment  was  carried  out  in  a  cyclotron  and 


Best  Available  Copy 


■  Design  of  Experiments  ^ 

jje  had  “corrected”  the  neutron  doses  for  a  known  gamma  ray  contamination  of  about 
7  per  cent.  The  young  statistician,  who  had  little  or  no  experience  with  radia~ 
tion  experiments  and  who  at  the  moment  was  not  particularly  interested  in  learn- 
jjjg  about  radiation,  proceeded  promptly  and,  as  it  turned  out,  rashly  with  his 
.  analysis.  From  the  biologist  he  had  obtained  the  following  data: 

Gamma  experiment  (i  =  1,  ,,,,  n) 

yj  =  proportion  of  mutations 
*  dose  of  gamma  radiation 

Neutron  experiment  (j  =  1,  ...,  m) 
vl^  a  proportion  of  mutations 
Vj^  *  ttcorrocted”  dose  of  neutron  radiation. 

Originally  -there  were  several  replica-tions  at  each  dose  point  and  the  s-fca-bisti- 
cian  had  carefully  tes-ted  for  homogeneity.  Finding  no  significant  departure  from 
binomiality,  he  pooled  the  replications  and  proceeded  with  a  wei^-bed  linear  re¬ 
gression  for  each  experiment.  He  ended  up  with  the  two  equations 

'y  ®  a  +  b  X 
u  =  a  -f  bnV, 

for  the  gamma  and  neutron  experiments,  respectively.  Finally  he  made  the  re¬ 
quested  -test  of  significance  and  chalked  up  (he  thotight)  another  successfully 
completed  problem. 


The  -third  kind  of  error  made  by  this  statistician  was  most  certainly  avoid¬ 
able,  He  had  only  to  question  the  gene-ticist  about  the  nature  of  the  “correction” 
of  the  neutron  dose,  and  without  having  to  learn  much  at  all  about  radia-tion 
dosimetry,  he  would  have  discovered  his  error.  The  consulting  s-tatistician, 
particul^ly  in  the  physical  science  and  engineering  fields,  soon  learns  to  ques¬ 
tion  any  ’'corrections”  applied  by  the  experiraen-ter  before  the  data  are  presen-ted 
for  analysis.  In  -the  problem  at  hand  it  turned  out  that  the  geneticist  had  simply 
reduced  the  original  neutron  dose  by  7  per  cent  intending  thereby  -to  evaluate  the 
effect  of  neutrons  uncontamina-ted  by  gamma  rays.  Overlooked  was  the  fact  -Uiat 
the  corresponding  biological  effect  still  included  the  gamma  component.  When  the 
error  was  uncovered,  a  somewhat  different  approach  was  -taken.  The  -two  experi- 
men-bs  were  analyzed  simultaneously  by  minimizing 


where 


n 

S 

i=l 


9  n 
r  -t-n 


»  a'  + 

^  -  a»  +  (0.07w^)  +  (0,93w^), 

>diere  the  uncorrected  neutron  doses  (w^)  were  determined  from  the  relation, 

■^i  =  0.93  and  whereof  and  v^^  are  the  appropriate  weights.  Needless  to  say, 
the  second  approach  yielded  estimates  and  s-bandard  errors  somewhat  different 


6  Design  of  Experiments 

from  those  of  the  first  approach^  and  liie  new  significance  test  had  to  allow  for 
tiie  covariance  between  and  bj^'. 

Once  again  in  this  example  the  blame  must  rest  primarily  with  the  statisti¬ 
cian.  Perhsps  in  his  eagerness  to  apply  his  newly  acquired  skills  to  a  problem 
which  he  thought  fell  into  a  pattern  he  had  seen  in  graduate  school,  he  teapo- 
rarily  lost  his  common  sense.  Whatever  the  explanation  it  is  hard  to  draw  any 
conclusion  other  than  one  idiich  reflects  the  fact  that  he  was  just  not  reacfy  to 
do  statistical  consulting  on  his  own. 

Eyawpl  <■>  lil.  This  exairple  illustrates  in  a  sort  of  general  way  a  situation  idiich 
must  occur  itiarQr  times  in  the  life  of  every  consulting  statistician.  It  might  be 
called  "Consulting  by  remote  control,"  or  "Communication  without  representation." 
Frequently  the  situation  arises  in  a  manner  similar  to  tiie  one  in  this  example. 

A  research  worker  v4io,  mostly  through  experience,  had  become  fairly  adept 
with  many  text-book  statistical  methods,  encountered  a  problem  which  was  new  to 
him  and  which  he  could  not  find  in  his  elementary  text-book.  He  had  computed 
two  product-moment  correlation  coefficients  and  wanted  to  test  the  hypothesis 
that  the  population  correlations  were  equal.  He  was  reasonably  sure  that  the 
t-test  would  not  be  appropriate,  but  he  was  also  sure  that  some  method  must 
exist.  The  research  organization  to  which  he  belonged  did  not  employ  a  statis¬ 
tician,  but  he  had  a  statistician  friend  in  the  same  city  vho  he  felt  would  cer¬ 
tainly  have  the  answer.  For  such  a  minor  problem  the  trip  across  town  was  hardly 
worthwhile,  but  thanks  to  Alexander  Graham  Bell,  he  knew  he  could  solve  his  prob¬ 
lem  without  leaving  his  desk.  The  phone  call  was  made  and  the  statistician,  not 
wanting  to  be  impolite  or  difficiilt  by  suggesting  a  meeting  in  person,  and  being 
allergic  to  long  telephone  conversations,  quickly  told  his  friend  about  the 
z-transforraation  and  where  to  find  an  example  of  its  use. 

Sometime  later  both  men  happened  to  attend  the  same  local  seminar,  and  upon 
seeing  his  friend,  the  research  worker  rushed  over  to  thank  him  for  the  useful 
advice  about  the  z-transformation.  During  the  course  of  the  conversation,  the 
statistician  discovered  to  his  horror  that  the  experimenter  had  taken  N  simul¬ 
taneous  observations  on  three  mutually  correlated  variables,  x,  y  and  z,  and  the 
two  correlation  coefficients  which  had  been  the  subject  of  the  aibrementioned 
telephone  conversation  turned  out  to  be  the  correlations  between  x  and  z  and 
between  y  and  z.  With  much  onbarrassraent  he  realized  that  he  had  recommended 
a  t-test  oetween  two  z-transforraed  correlation  coefficients  which  were  not  ' 
inaependent.  Summing  up  all  his  courage  he  confessed  his  mistake  and  referred 
the  experimenter  to  the  paper  by  Hotelling  [iQin  which  it  is  shown  that  under 
the  null  hypothesis,  -  Pyj,, 

Vh  -  3(rxi  -  +  fra- 

is  distributed  approximately  as  "Student’s"  t  with  N  -  3  degrees  of  freedom, 
where 


1 

^xz 

^xy 

^xz 

1 

r 

r 

1 

xy 

yz 

D  » 


7 


Design  of  Experiments 

The  experimenter  tried  to  accept  the  blame  for  this  mistake  contending  that  he 
hould  have  taken  the  time  to  explain  the  actual  problem  more  completely.  Actually 
in  this  error  of  the  third  kind  it  would  appear  that  both  parties  were  at  fault 
gnd  for  essentially  the  same  reason  -  neither  wanted  to  take  the  time  to  find  out 
what  the  other  was  really  doing. 

■vit^ple  17.  It  seems  desirable  to  include,  as  one  of  the  examples  of  errors  of 
third  kind,  an  error  of  omission.  Essentially  these  errors  occur  when  the 
statistician  fails  to  do  the  best  job  possible  simply  because  he  has  not  taken 
enough  time  to  question  the  research  worker  thoroughly  about  his  experiment.  In 
these  cases,  the  answer  given  is  often  the  right  answer  to  the  right  problem  but 
xiot  always  the  best  right  answer.  The  following  example  iHustrates  an  error  of 
this  kind. 

A  geneticist  was  engaged  in  a  series  of  recombination  experiments  with  bac- 
teriophs^e  T^.  He  was  interested  in  testing  for  independence  of  the  occurrence 
of  two  markers,  r  and  tu.  Under  the  hypothesis  of  independence,  in  an  experiment 
in  which  plaques  are  coimted  for  all  four  types  of  progeny,  the  observed  and  ex¬ 
pected  plaque  counts  can  be  represented  as  shown  in  the  following  table: 


HAQUE  COUNT  FREQDENCIBS 


Frequency 

Type  of  Progeny 

Total 

Parental 

1  tli^ 

r*tu* 

Observed 

3.2 

^3 

% 

M 

Expected 

M 

M  q^^pg 

M  PiP2 

M 

^diere  p-j^  and  pg  are  the  probabilities  of  events  leading  to  recombinants  r'*' 
and  tu"*",  respectively,  and  q^^  «  1  -  p^^,  q  =  1  -  pg.  Typical  experiments  of 
this  type  yield  about  90  per  cent  of  parental  type  progeny  and  10  per  cent  re¬ 
combinants  . 

The  geneticist  who  was  doing  these  experiments  had  had  some  experience 
using  chi-square  in  testing  for  independence  with  genetic  frequently  data,  but 
since  there  were  two  parameters  to  be  estimated  in  this  case,  he  was  not  quite 
sure  hov7  to  proceed.  So  he  visited  a  young  biometrician  and  presented  him  with 
data  of  the  type  shown  in  the  above  table.  After  explaining  the  experiment,  he 
mentioned  casually  that  he  had  much  more  data  from  another  replication  of  this 
esqjeriment  but  that  it  would  probably  be  of  little  use  since  not  all  of  the 
four  classes  of  progeny  were  counted. 

Perhaps  it  was  too  early  in  the  morning,  or  perhaps  the  biometrician  had 
his  mind  on  something  else.  In  any  event  he  ignored  the  eiiperlmenter's  casual 
remark  about  the  other  replication,  proceeded  to  obtain  maximum  likelihood 
estimates  of  the  parameters  p^  and  P2  from  the  complete  experiment  and  cor¬ 
rectly  computed  a  chi-square  vjith  one  degree  of  freedom  which  provided  the 
required  test  for  independence. 


e 


Design  of  Experiment( 


The  results  of  the  test  were  somewhat  inconclusive,  at  least  in  the  mind  of 
the  e3q)eriinenter,  and  he  began  to  reflect  on  why  he  had  done  the  second  replica¬ 
tion  in  the  first  place.  The  greatest  labor  in  ejsperiments  of  this  type  is  the 
counting  of  plaques,  and  since  about  90  per  cent  of  them  r^resent  parental  type 
progeny,  most  of  the  work  is  done  in  counting  plaques  which  provide  little  infor¬ 
mation  about  independence.  It  seoned  reasonable  to  him,  therefore,  to  do  an 
experiment  in  idaich  only  the  recombinants  were  counted.  This  was  the  second 
replication  which  he  had  mentioned  to  the  statistician  and  it  was  about  twice 
the  size  of  the  first. 

■ifl.th  these  points  in  mind  he  returned  to  the  statistician  and  asked  specifi¬ 
cally  if  there  wasn’t  some  way  in  which  the  information  from  the  second  replica¬ 
tion  could  be  combined  with  the  first  so  as  to  provide  a  more  sensitive  test  for 
independence.  As  a  restilt  of  this  gentle  prodding  by  the  experimenter,  vho  was 
obviously  thinking  more  clearly  than  our  yoiujg  biometrician,  an  approach  was 
fo\ind  which  would  make  use  of  all  the  data.  The  result  of  the  second  experi¬ 
ment  was  representable  as: 


PIAQDE  COUNT  FREQUENCIES 


T;ype  of  Phogeny 

Frequency 

Parental 

r+ 

tu+ 

r+tu+ 

Total 

Observed 

•• 

^7 

N 

Expected 

•* 

N  Pj^qg 
(1  -  qiq2) 

N  q^LPg 
(1  -  qiq2) 

N  PjPg 
(1  -  q]^q2) 

N 

Under  the  hypothesis  of  independence  the  joint  probability  of  both  samples  is 


Ml 


^1**2'*  ^*^4* 
Nl 


(qiq2^  ^(Piq2J^^(qiP2^^(piP2^^ 


The  maximum  likelihood  equations  for  P]^  and  Pg  can  be  reduced  to  a  quadratic 
equation  in  P2  with  only  one  admissible  root,  and  an  equation  in  >hlch  is 
linear  in  Pp.  A  chi-square  with  three  degrees  of  freedom  is  then  easily  com¬ 
puted.  In  this  particular  experiment  the  added  strength  of  the  second  repli¬ 
cation  was  sufficient  to  convince  the  geneticist  that  he  had  no  reason  to  sus¬ 
pect  lack  of  independence,  whereas  the  significance  level  of  chi-square  based 
on  the  first  replication  alone  had  left  him  in  doubt. 


Perhaps  there  are  only  a  few  young  statisticians  idio  would  commit  an  error 
of  this  kind,  tM,t  the  temptation  must  be  great  in  many  practical  situations  for 


Design  of  Experiments 

the  new  consultant  to  discard  extra  observations  which  make  the  pattern  of  an 
experiment  look  different  from  what  he  has  been  accustomed  to  seeing  in  class 
examples.  We  so  often  hear  it  said  that  many  research  workers  never  come  to 
the  statistician  until  after  the  experiment  is  completed,  and  that  frequently 
much  of  the  data  is  worthless  for  statistical  analysis.  Certainly  this  does 
happen  more  often  than  it  should,  but  in  many  apparently  hopeless  cases  it  also 
happens,  as  in  the  foregoing  example,  that  a  little  extra  effort  on  the  part  of 
the  consultant  will  yield  a  workable,  relatively  sinq^le  method  of  analysis.  A 
feel  for  these  situations  comes  only  with  e^qperience,  but  the  graduate  student 
should  be  given  a  chance  to  get  some  of  this  ejqjerience  before  he  starts  out 
conqjletely  on  his  o^m. 

A  POSSIBIE  SOLDTION  TO  THE  mOBIEM 

Hany  readers  may  object  to  the  examples  which  were  chosen  to  illustrate 
errors  of  the  third  kind  as  being  unrealistic  and  unlikely  to  happen  in  actual 
practice.  To  a  large  extent  they  are  right  because  all  of  the  errors  discussed 
were  eventually  corrected  and  hence  no  longer  qualify  as  errors.  But  it  should 
be  obvious  that  the  only  errors  of  the  third  kind  which  become  known  are  those 
which  are  corrected,  and  for  every  one  ■vdiich  is  corrected  there  must  be  many 
which  we  will  never  know  about  .  If  we  are  ready  to  admit  that  these  errors  are 
committed  and  perhaps  in  large  numbers,  then  we  should  also  be  ready  to  do  some¬ 
thing  about  it. 

The  obvious  place  to  start  is  in  graduate  schools  ■vdiere  degrees  in  statis¬ 
tics  are  awarded  to  students  who  expect  to  do  statistical  consulting.  For  some 
time  to  come  these  institutions  will  provide  the  largest  part  of  the  supply  of 
consulting  statisticians.  If  the  consulting  statistician  were  required  by  law 
to  obtain  a  license  before  he  could  go  into  practice,  we  could  take  our  cue  froi 
the  medical  profession.  Every  statistics  graduate  who  expects  to  consult  would 
be  required  to  intern  for,  say,  one  year,  and  at  the  end  of  this  time  would  be 
required  to  take  an  examination  to  obtain  his  license.  This  arrangement  might 
or  might  not  prove  satisfactoiy  but  most  people  would  admit  that  it  is  not 
practicable,  at  least  not  in  the  foreseeable  future. 

Let  us  turn  then  to  ihe  teaching  profession.  In  many  states  licenses  to 
teach  are  either  not  required  or  can  be  obtained  merely  by  payment  of  a  fee, 
and  the  teachers  colleges,  in  addition  to  providing  a  comprehensive  curriculum 
of  course  work  m\ist  somehow  prepare  students  for  actual  teaching.  They  accom¬ 
plish  this  by  the  long  established  requirement  of  practice  teaching.  Every 
consciaitious  teachers  college  includes  as  part  of  its  curriculum  a  period  in 
■which  the  student  leaves  the  campus  and  under  the  direction  of  an  experienced 
"teacher  learns  to  teach  by  teaching.  In  some  schools  practice  teaching  begins 
at  the  junior  level,  and  college  administrators  have  found  that  there  is  ab¬ 
solutely  no  substitute  for  it.  Vfhy  then  should  not  the  statistics  student  be 
required  to  learn  to  consult  by  consulting? 

Some  statistics  departments  have  attempted  to  achieve  this  goal  by  having 
■the  student  ”sit  in"  on  consultations  held  by  members  of  the  staff.  IMs  un¬ 
doubtedly  helps  to  some  extent,  but  frequently  the  student  participates  very 
little  in  the  discussion  and  some  staff  members  congjlain  that  their  clients  are 
reluctant  to  talk  in  the  presence  of  gradviate  students.  'Hhereas  attendance  at 
staff  consultations  may  serve  to  introduce  the  student  to  the  complexities  of 


10 


Design  of  Experiments 


consulting,  he  can  never  learn  to  cope  with  them  until  he  tries  it  on  his  own. 

To  achieve  this  opportunity  it  is  imperative  that  he  leave  the  campus  and  ”intem'' 
in  the  field. 

Exactly  how  this  can  best  be  accomplished  is  anybody’s  guess.  As  a  start  it 
would  sean  that  graduate  schools  should  attempt  to  obtain  affiliations  with  con-  t 
suiting  groups  in  government  and  industry,  much  as  medical  schools  are  affiliated  1 
with  hospitals,  or  teachers  colleges  with  practice  schools.  Universities  contri-  c 
bute  heavily  to  government  and  industry  through  the  medium  of  the  research  con-  i 
tract.  Both  parties  benefit,  of  course,  even  under  the  present  system,  but  cer¬ 
tainly  both  would  benefit  more  in  the  long  run  if  programs  of  student  participa¬ 
tion  could  be  arranged.  There  must  be  many  instances  in  which  essentially  this  c 
sort  of  arrangement  has  been  made  and  proved  successful,  but  only  for  an  isolated  ^ 
student  here  and  there.  To  be  really  effective  such  a  program  would  have  to  be  £ 

made  an  integral  part  of  the  graduate  curriculum  and  listed  in  the  catalog  as  one 
of  the  requirements  for  a  degree. 

Those  of  us  in  the  profession  of  statistical  consulting  who  take  honest  pride  I 
in  our  work  face  a  real  challenge.  Two  avenues  are  open  to  us.  One  is  to  ignore 
the  presence  of  this  situation  and  to  continue  along  our  narrow  paths  of  indi-  £ 
vidual  self-satisfaction,  oblivious  of  Ihe  effect  it  might  have  on  the  future  of  £ 
our  profession.  If  this  course  is  followed,  \dien  the  production  rate  of  new 
statisticians  begins  to  catch  up  with  the  demand,  we  will  face  loss  of  prestige 
and  public  confidence,  and  possibly  even  virtual  extinction.  The  other  avenue  ^ 
is  to  recognize  the  problem,  to  appreciate  that  it  is  constantly  increasing  in  * 
intensity  and  to  push  hard  for  positive  action  as  soon  as  practicable.  We  should  ^ 
have  begun  yesterdayj  today  we  are  only  thinking  about  it;  tomorrow  we  must  act. 

REFERENCE  * 

QlJ  Hotelling,  Harold,  ’’The  selection  of  variates  for  use  in  prediction  with 
some  comments  on  the  general  problem  of  nuisance  parameters,”  Annals  of 
riathematical  S tati  s ti  c s 11  (19U0),  271-83. 


( 

i 

( 

< 

( 


!i  ■ 


Best  Available  Copy 


// 

THE  AASHO  ROAD  TEST  AS  AN  EXAMPLE  OF 
LARGE  SCAIE  TESTS 

Carl  F.  Kossack 
Purdue  University 

The  AASHO  Road  Test  is  an  extensive  study  of  phenomena  which  arise  when 
highway  pavements  of  varjd-ng  structural  designs  are  subject  to  specified  traffic* 
The  experiment  is  sponsored  by  the  American  Association  of  State  Highway  Offi¬ 
cials,  the  AASHO,  and  is  administered  by  the  National  Academy  of  Science  through 
its  Highway  Research  Board. 

The  Road  Test  may  be  the  largest  experiment  in  history  in  which  statistical 
designs  have  been  atterqjted.  Geographically,  the  Road  Test  is  contained  in  an 
area  about  300  feet  wide  and  eight  miles  long  near  Ottawa,  Illinois.  Treatments 
and  observations  average  to  cost  about  fifty  thousand  dollars  per  experimental 
unit  since  there  is  a  budget  of  over  twenty  million  dollars  for  more  than  four 
hundred  pavement  units.  Controlled  treatments  were  begun  in  19^6  and  will  be  con¬ 
tinued  into  i960.  Many  groups  serve  to  advise  the  administrators  and  staff  of  the 
Road  Test.  One  such  group  is  the  Statistical  Advisory  Panel,  of  which  I  am  chair¬ 
man,  with  W.  J.  Youden,  National  Bureau  of  Standards,  and  K.  A.  Brownlee,  Univer¬ 
sity  of  Chicago,  the  other  two  members.  Dr.  Paul  Irick  is  the  full  time  senior 
statistician  associated  with  the  Field  Office  of  the  Road  Test. 

In  considering  the  eijqjerimental  design  for  such  an  extensive  experiment  as 
the  Road  Test,  it  seems  to  be  quite  necessary  for  one  to  first  consider  the  formal 
structure  of  an  experiment  and  then  to  attenqpt  to  describe  the  Road  Test  with 
respect  to  these  more  general  views. 

Following  the  approach  used  by  Dr.  Irick,  one  can  consider  the  structure  of 
an  experiment  from  five  different  but  interrelated  aspects: 

(1)  Objectives 

(2)  Designs  for  Data  Acquisition 

(3J  Experimental  Data 

(4)  Models  for  Association  among  experimental  variables 

(5)  Analyses  of  the  data 

Let  us  consider  first  the  problem  of  Objectives.  Although  experimental 
objectives  generally  call  for  the  discovery  or  demonstration  of  associations 
among  observable  phenomena,  explicit  objective  must  often  be  inferred  from  gen¬ 
eral  statements  of  purpose.  This  inference  often  makes  any  consideration  of  the 
consistency  of  objectives  with  the  remaining  aspects  of  the  experiment  a  matter 
of  interpretation.  This  point  can  bear  careful  consideration  in  most  experiments 
since  all  too  often  the  general  purposes  are  vague  and  ambiguous  in  their  ex¬ 
pression  if  in  fact  they  are  even  stated.  One  trouble  with  a  large  scale  experi¬ 
ment  in  this  connection  is  that  the  investment  of  so  much  time  and  energy  in  a 
large-scale  experiment  makes  -the  interpretation  of  these  objectives  most  critical 
since  one  usually  does  not  have  the  option  of  simply  modifying  the  experimental 
set  up  on  the  next  time  around  if  it  is  discovered  that  one  has  misinterpreted  the 
objectives.  We  can  thus  note  the  first  characteristic  of  a  large-scale  experiment 


12 


Design  of  Experiments 


that  distinguishes  it  from  other  experiments.  That  is,  the  interpretation  of  the 
general  purpose  into  specific  objectives  can  rarely  be  evolved  sequentially  as  th; 
experiment  progresses  but  must  be  clearly  determined  in  advance  of  the  actual 
acquisition  of  the  experimental  data. 

In  the  AASHO  Test  the  following  general  puiposes  were  involved  by  a  national 
advisory  committee. 

Purposes:  The  AASHO  Road  Test  is  intended  to  develop  engineering  facts  and 
criteria  v:hich  can  be  used 

(1)  In  the  design  and  construction  of  new  pavements 

(2)  In  the  preservation  or  betterment  of  existing  pavements  and  to  evaluate 
the  load  carrying  capabilities  of  existing  highways 

(3)  As  an  engineering  basis  for  the  enactment  of  adequate  and  equitable 
legislation  covering  allowable  loadings  and  highway  taxation  structure 

(4)  To  provide  information  to  assist  vehicle  manufacturers  as  to  the  types 
and  capacities  of  highway  vehicles  which  they  design,  construct,  and 
offer  as  equipment  to  obtain  overall  economy  of  highway  transportation 

(5)  To  provide  basic  information  as  to  engineering  problems  and  the  cor¬ 
related  costs  of  highways  of  different  load  carrying  capabilities,  and 
the  proper  taxation  to  cover  cost  of  structural  standards  for  highways 
which  may  be  related  to  the  cost  of  vehicle  operation. 

If  one  reflects  for  a  minute  over  these  general  purposes  I  am  sure  that  he 
will  be  impressed  with  the  fact  that  they  represent  a  coverage  of  problems  in  the 
highway  transportation  field  that  is,  to  say  the  least,  breath-taking.  The  prob¬ 
lem  in  the  first  stage  of  the  experimental  program  then  is  to  take  such  high 
sounding  and  general  purposes  and  to  translate  them  into  more  meaningful  and  con¬ 
crete  objectives.  Such  a  task  for  large  experiments  of  this  type  is  a  formidable 
one  and  requires  a  deep  appreciation  of  the  **state  of  the  art”  of  the  area  in¬ 
volved  as  well  as  an  appreciation  of  the  research  capabilities  of  the  program 
being  developed.  In  the  case  of  the  AASHO  Road  Test,-;^his  interpretation  of  pur¬ 
poses  into  objectives  took  more  than  three  years  and  in  fact  the  following  objec¬ 
tives  were  not  completely  formulated  until  the  experimental  design  itself  was 
completed. 


The  official  objectives  are  as  follows: 

Objective  1:  "To  determine  the  significant  relationships  between  the  mumber  of 
repetitions  of  specified  axle  loads  of  different  magnitude  and 
arrangement  and  the  performance  of  different  thicknesses  of  uni¬ 
formly  designed  and  constructed  asphaltic  concrete,  plain  portland 
cement  concrete,  and  reinforced  portland  cement  concrete  surfaces 
on  different  thicknesses  of  bases  and  subbases  when  on  a  basement 
soil  of  known  characteristics," 

"To  make  special  studies  dealing  with  such  subjects  as  paved 
shoulders,  base  types,  pavement  fatigue,  tire  size  and  pressure, 
and  heavy  military  vehicles,  and  to  correlate  the  findings  of 
these  special  studies  with  the  results  of  the  basic  research," 


Objective  2: 


Design  of  Experiments 

Objective  3:  "To  provide  a  record  of  the  type  and  extent  of  effort  and 

materials  required  to  keep  each  of  the  test  sections  or  portions 
thereof  in  a  satisfactory  condition  until  discontinued  for  test 
purposes 

Objective  i|:  “To  develop  instrumentation,  test  procedures,  data,  charts, 
graphs,  and  formulas  which  will  reflect  the  capabilities  of 
the  various  test  sections,  and  which  will  be  helpful  in  future 
highway  design  in  the  evaluation  of  the  load  carrying  capabili¬ 
ties  of  existing  highways  and  in  determining  ttie  most  promising 
areas  for  further  highway  research. ” 

Because  of  the  time  restriction,  let  us  leave  the  objective  phase  of  the 
experiment  and  consider  the  Designs  for  Data  Acqiisltion.  Let  us  look  at  the 
general  layout  of  the  Road  Test  Experiment  in  ordei-  to  facilitate  our  consider¬ 
ation  of  this  design  phase  of  the  experiment.  Figures  1,  2,  and  3  show  the 
general  layout  of  the  test.  The  fact  that  each  loop  must  be  separated  from 
the  other  loops  created  some  design  complications,  but  essentially  the  experl- 
^ntal  unit  involved  are  sections  of, pavements  varying  from  120  to  21^0  feet 
in  length  within  each  loop.  As  mentioned  earlier  there  were  available  some 
liOO  such  experimental  units  to  use  in  the  design. 


£*  ^1:  $CM i.aoP~TANQ£flT  L-AyouT*  -AAS<>VO  TS^T 


isl 


'fts^  Wi Jaes 


lurntjrovmc 


t_OOP  A. 


IM>P  O 


Piojfll^lc  p^vcvn«r*i; 


UOOP  c  v 


U>6P  E"  MX>P  pi 


8  vniles  C'“pP'’6>t-^ 


IjOOP  la 


p«VCn<a#v|4’ 


I 


PIOORff  2*  SCH£nipric  LQfsOEti  ST^OO-raRni-  SECTlOri 


/rnmUirmH/  y  ^ 


•Ui'T 


«bc4-h 


^Iinay(«>  vhIk/  U«,ynt)v 


GROUNO 


PRECEDING  PAGE  BLANt\ 


North  Tangent 


17 


cn 

z 

o 


UJ  o 

S  W  W 

5  UJ 
H 

o 

»- 
o:  w 

h-  UJ 

to  K 


M  .  «•  ' 

to  to 
to  H 


o 

o 

tu 

Z 

03 

o 

z 

o 

tc 

H" 

O 

o 

u. 

ID 

tr 

(0 

H" 

z 

(/) 

o 

Z  ' 

o 

H 

o 

o 

UJ 

t/) 

(/) 

UJ 

H* 

(0 

z 

(/> 

UJ 

< 

Ul 

h* 

_J 

o 

h- 

o 

o 

< 

ul 

u. 

z 

< 

< 

q: 

h- 

(/) 

z 

o 

X 

OF 

o 

h* 

w 

<t 

o 

< 

H 

Z> 

o 

UJ 

(0 

UJ 

>- 

_J 

X 

< 

< 

p- 

_J 

Ck: 

3 

u. 

-J 

H 

o 

< 

O 

o 

D 

a 

a. 

or 

o 

>- 

H 

o 

(f> 

j 

_j 

Ll- 

PRECEDING  PAGE  BLANK 


Design  of  Experiments 


19 


To  evolve  the  design i  it  was  decided  that  the  main  experiment  was  to  center 
around  a  study  of  the  interrelationship  that  involved  the  following  factors: 

Type  of  pavement 

Surface  thickness 

Base  thickness 

Subbase  thickness 

Axle  type 

Axle  load 

The  design  used  for  these  variables  was  to  first  divide  the  two  principal 
pavement  types,  flexible  (asphalt)  and  rigid  (concrete')  into  separate  experiments 
and  within  each  of  these  experiments  to  use  essentially  a  factorial  design.  That 
is  in  the  flexible  case  the  factorial  was  taken  as  a  3x3x3,  three  surface  thick¬ 
ness,  three  base  thickness  and  three  subbase  thicknesses.  Figure  4  shows  in  a 
schematic  diagram  the  type  of  factorial  design  used  and  how  the  various  levels  of 
each  factor  were  assigned  to  the  various  loops.  The  surface  types  were  divided 
between  the  two  tangents  of  the  loops,  the  two  axle  types  were  divided  between  the 
two  lanes  making  up  the  loops  and  the  varying  loads  were  divided  over  the  loops,  ^ 

Test  traffic  for  each  main  loop  will  consist  of  twelve  vehicles,  six  in 
outer  lanes  and  six  in  inner  lanes.  Vehicles  v/ill  proceed  counter  clockwise 
around  each  loop  at  30  m.p.h.  and  in  a  prescribed  distribution  of  lateral  place¬ 
ment.  Load  applications  are  scheduled  to  occur  simultaneously  in  all  traffic 
loops  so  that  each  structural  section  receives  approximately  SOO  vehicle  appli¬ 
cation  per  eighteen  hour  day,  six  days  per  week.  The  test  traffic  will  continue 
for  about  two  years  and  will  involve  considerably  more  than  ten  million  miles  of 
traffic.  Figure  5  shows  how  those  types  of  vehicles  and  loads  were  assigned  to 
the  various  loops. 


LOOP  A 

12  Kips 
Single 


LOOP  A 


RIGID  PAVEMENT 

FACTORIAL  EXPERIMENT  DESIGN 


65 


8.0 


9.5 


12.5 


LOOP  C 


24  Kips 
Single 


LOOPC 

40  Kips 
TQnden> 


R  N _ R_  n|r|n|r  n|r|n|r|n 

I  X  ’® 


■■■■ 

■■■■ 


LOOP  D 

30  Kips 
Single 


LOOP  D 

48  Kips 
Tandem 


Notes:  x  Represents  one  test  section. 

®  Replicate  section. 

R  Reinforced  section,  6  panels  at  40  ft.  equals  240  ft. 
N  Non-reinforced  section,  8  ponels  of  I5  ft,  equals  120  ft. 


Figure  I4:  Factorial  Design  -  Rigid  Pavement  PRECEDING  PAGE  B 


23 


K 

|)esign 


of  Experiments 


preceding  page  blank 


I  V/ith  this  description  of  the  Road  Test  let  us  give  some  attention  to  the  more 
Retailed  description  of  the  four  additional  aspects  of  an  experimental  investiga- 
Mon  over  and  beyond  the  setting  up  of  objectives,  . 

P 

Under  the  Designs  for  Data  Acquisition  we  can  mention  the  following  sub  areas  t 

f;. 

I  l)  Selection  of  environmental  and  experimental  units 

~  "  — — 

I  In  the  Road  Test  we  had  such  problems  as  to  why  locate  at  Ottawa,  when  to 
|)egin  traffic,  how  long  to  make  the  sections,  ■vdiat  shape  should  they  have,  how 
|iuch  spacing  to  liave  between  sections,  etc. 

I  2)  Selections  for  one  level  factors,  design  factors,  co  variables 

« 

I  In  this  area,  for  example,  the  use  of  a  single  aggregate  in  the  construction 
|fas  a  major  consideration.  But  the  problems  involved  are  numerous  since  it  is  at 
this  time  that  one  starts  to  consider  the  characteristic  of  the  model  to  be 
Ivolved. 

r 

I  3)  Selections  for  dependent  variables 

I 

I  One  could  spend  some  time  on  this  problem.  Just  to  recall  the  range  of 
Interest  expressed  in  the  general  purposes  leads  one  to  recognize  that  perhaps 
fio  single,  simple  dependent  variable  would  suffice.  At  present  a  condition  index 
is  being  evolved  using  several  variables  in  the  hope  that  through  such  an  index 
bne  can  measure  the  overall  performance  of  a  section  under  repeated  application 
t)f  loads.  How  to  develop  such  an  index  is  a  major  problem  in  itself. 


i  U)  Selection  of  transducer  system  for  all  measured  variables 

i 

I  In  this  area  one  encounters  the  problems  of  actual  making  the  physical 
liieasurements  of  the  variables  involved  in  an  e^qperiment.  In  the  case  of  large 
fcale  experiments  the  transducer  systems  often  need  to  be  automatic  which  intro¬ 
duce  problems  of  both  validity  and  reliability.  The  danger  is  that  one  will  be¬ 
come  so  wrapped  up  in  developing  the  transducer  systems  that  he  will  almost  forget 
|/he  main  purpose  of  the  experiment. 

Selections  for  replication  factors 


i  What  is  needed  is  a  decision  as  to  the  extent  of  replication  that  should  be 
hade  associated  with  each  of  the  several  design  variables.  In  the  case  of  a  large 
pcale  experiment  in  which  the  cost  of  each  individual  observation  is  considerable, 
a  complete  replication  of  the  experiment  is  often  uneconomical  as  well  as  politi¬ 
cally  not  feasible.  However,  the  fundamentals  of  scientific  e^qjerimental  design 
pequire  both  replication  and  randondzation.  In  the  Road  Test  Experiment  a  partial 
replication  was  evolved,  see  Figure  so  that  wilhin  each  loop  there  appeared 
some  replicated  sections.  One  should  remember  that  without  replication  no  true 
prror  variance  can  be  obtained  and  thus  any  relationship  that  is  evolved  from  non- 
replicated  experimental  data  must  be  taken  at  face  value  since  confidence  limits 
pan  not  be  determined.  The  other  requirement,  that  of  randomization,  frequently 
|ieets  resistance  from  the  experimental  worker  or  engineer  on  the  grounds  that  it 
|s  simply  busy  work  and  only  tends  to  complicate  the  operation  of  the  experiment, 
feel  it  significant  that  in  the  Road  Test  the  Statistical  Panel  stood  firm  on 


26 


Design  of  Experiments 


the  principal  of  complete  randomization  and  had  this  principal  adopted  by  the 
National  Advisory  Committee  consisting  of  the  foremost  highway  engineers  of  the 
country.  Thus  the  Road  Test  design  is  a  randomized  design.  The  acceptance  of 
this  principal  in  a  case  such  as  this  should  put  the  lie  to  anyone  vho  complains 
that  to  randomize  a  design  is  not  possible.  In  building  these  highway  sections 
the  randomization  often  made  it  necessary  to  build  the  thin  sections  adjacent  to 
thick  sections  according  to  the  way  the  randomization  came  out.  This  randomiza¬ 
tion  feature  is  one  of  the  main  requirements  coming  under  the  final  area  of  Desig 
and  can  be  considered  as  item  six: 

6)  Space-time  layouts  for  units,  factors  and  observations . 


To  turn  our  attention  now  to  the  third  aspect,  that  of  Baperimental  Data  we 


have: 

(1)  Values  for  dependent  variables 

Here  again,  I  could  dwell  at  some  length  on  the  problems  associated  with 
this  aspect  of  an  experiment.  In  njy  experience,  this  country  is  filled  with 
persons  busily  staged  in  making  observations  on  the  wrong  dependent  variable.  | 
We  seem  to  have  in  operation  an  unwritten  axiom  which  says  that  as  long  as  a  | 
dependent  variable  has  been  defined  and  is  available  that  such  a  variable  will  | 
meet  the  requirements  of  the  experiment.  In  the  case  of  the  Road  Test,  concrete  | 
was  actually  being  poured  before  an  adequate  dependent  variable  was  finally  I 
evolved.  Even  at  this  time  with  the  trucks  actually  beginning  to  run  one  it  is  | 
not  certain  that  satisfactory  values  of  the  dependent  variables  have  been  evolvedf 

■i 

(2)  Levels  for  design  factors  I 

I 

When  a  balanced  design  such  as  the  coitplete  factorial  is  used,  the  limitatici 
encountered  as  to  the  number  of  levels  available  for  the  design  factors  is  truly  | 
trying.  One  cannot  simply  throw  a  couple  of  extra  levels  into  the  experimental  | 
design  to  increase  the  assurance  that  the  interesting  range  for  the  variable  is  I 
covered  for  each  factor.  The  expense  is  overpowering  due  to  the  multiplicative  1 
nature  of  most  designs.  In  the  Road  Test  we  wanted  to  create  a  design  so  that  | 
the  probability  of  a  section  failing  sometime  during  the  test  would  be  about  2/3.| 
This  required  -that  the  section  design  straddle  the  point  of  adequate  design  for  | 
given  type  of  traffic .  It  should  be  noted  the  always  occurring  dilemma  that  I 
if  one  knew  how  to  design  a  highway  we  could  thus  design  the  experiment  to  find  | 
out  how  to  design  a  highway.  To  meet  this  situation  we  called  upon  the  best  ex-  | 
perience  in  the  country  on  highway  design  to  aid  in  the  determination  of  the  | 
levels  to  be  used  in  the  design. 


Dr.  Box  and  his  associates  have  recently  considered  this  problem  and  have 
evolved  some  fairly  significant  results,  but  most  of  these  require  some  sequenti4 
programming.  In  a  large  scale  cpcperiment  especially  those  covering  a  long  time  | 
period  the  inability  to  run  preliminary  experimental  trials  makes  the  problem  moil 
critical.  I 


replication  factors 


■? 

■? 


can  imagine  tiie  type  of  problem  involved  in  this  area  when  you  realize  I 
^^t  the  same  (^fficulties  are  present  here  as  in  the  design  levels.  However,  I 
there  usually  is  not  as  much  freedom  of  action  in  selection  of  replication  levels| 


sgesign  of  Experiments  2-7 

as  in  design  levels.  One  often  used  the  general  idea  in  such  situations  ihat  the 
•  plication  should  be  spread  over  the  entire  sample  space  so  as  to  yield  a  good 
^g^iinate  of  the  error  variance.  As  one  can  note  from  Figure  kt  the  Road  Itest 
design  followed  this  general  pattern. 

(^)  for  covariables 

Often  in  the  case  of  a  large,  scale  experiment  there  are  many  variables  that 
""liay  be  measured  which  have  some  relationship  mth  the  dependent  variable.  The  dan- 
ger  here  is  that  one  may  lose  sight  of  the  main  goal  of  the  program  in  his  zeal  to 
measure  all  the  variables  that  are  available.  In  fact  here  again  one  encounters 
^  axiomatic  concept  in  existence.  Namely,  if  one  simply  faithfully  measures  and 
{records  everything  that  happens  in  an  experiment  the  analyses  of  the  results  are 
lound  to  be  fruitful.  The  Road  Test  may  be  characterized  in  some  respect  as  an 
jbutstanding  opportunity  for  engineers  to  attempt  to  measure  variables  with  in- 
fcreasing  precision  and  automation.  I  have  really  lost  tab  of  the  extent  of  the 
[data  acquisition  involved  but  the  daily  rate  is  in  the  millions  of  digits,  all  of 
^ich  are  needing  storage  and  perhaps  eventual  analyses. 

Let  us  proceed  to  the  next  aspect,  that  of  Models  for  Associations  where 
(there  are  three. areas  delineated! 

f  Cl)  Definition  of  experimental  universe 

In  this  case  one  must  carefully  consider  what  typeg  of  generalizations  are 
^  be  made.  On  the  one  extreme  the  results  obtained  in  the  experiment  can  be 
pimply  stated  to  represent  the  particular  and  peculiar  situation  present  at  the 
^time  and  place  and  conditions  of  the  e:jq3eriment.  While  at  the  other  extreme,  the 
^Experimental  worker  may  attempt  to  conclude  ihat  his  findings  are  applicable  over 
^1  time  and  conditions.  Still  another  problem  is  the  determination  of  the  sets 
;of  variables  that  will  be  used  in  attenpting  to  explain  a  given  phenomenon.  I  can 
tonly  mention  these  problems  in  passing  since  their  careful  consideration  would  re- 
iquire  more  time  th^  is  available. 

i;  (2)  Forms  to  represent  associations 

After  giving  a  considerable  amount  of  time  to  this  problem,  I  find  that  it 
^is  in  this  area  in  which  one  frequently  has  difficvilties.  All  too  often  tests 
|are  made  of  the  data  in  which  there  is  implicity  involved  some  given  form  of 
^association  which  is  not  applicable,  but  the  routine  of  the  test  is  carried  out 
^'wiih  little  concern  for  these  restrictions. 

i;  Even  when  one  directly  attacks  the  form  of  association  problems  he  finds 
;•  difficulties  that  are  deeply  rooted.  One  needs  to  consider  many  questions  such 
I  as:  Can  the  ’'existing  state  of  the  art”  provide  Ihe  necessary  form?  How  should 
•iboundary  value  conditions  be  introduced . into  the  form?.  Is  one  interested  in  an 
|interpolative  form  or  an  extrapolation  form?  Can  one  use  a  routine  polynomial 
|model  for  the  association  and  obtain  satisfactory  results?  It  should  be  men- 
|;tioned  that  when  the  experiment  must  not  only  provide  information  which  will 
1;  yield  the  form  of  the  association,  but  must  also  yield  estimates  of  the  constants 
(.appearing  in  the  determined  form,  that  such  a  dual  requirement  is  most  exacting. 


28 


Design  of  Experiment! 

(3)  Allocation  of  assnng)tions  and  hypotheses 

It  is  apparent  that  care  must  be  given  as  to  which  relationships  will  be 
tested  for  their  validity  and  which  will  be  considered  as  assumptions  and  not 
amenable  to  testing.  Much  attention  has  been  given  in  recent  years  to  the  diffi. 
culties  encountered  in  sequential  testing  of  hypotheses.  One  knows  that  the 
usually  assumed  procedures  that  are  valid  for  single  tests  fail  x^hen  applied  to  ; 
a  sequential  situation.  Thus  the  failure  to  give  proper  allocation  to  assuii5)tioi;! 
and  hypotheses  will  often  lead  into  the  pit-fall  of  sequential  hypothesis  testing: 

The  final  aspect  is  that  of  Analysis .  Here  it  may  be  noted  that  analyses 
especially  those  associated  with  large  scale  experiments  are  as  often  non- 
mathematical  as  they  are  mathematical  in  their  nature. 

Frequently  an  analysis  will  singjly  consist  of  a  free  hand  sketch  of  a  curve  l 
through  some  plotted  points,  or  simply  a  visual  comparison  of  the  distribution  of 
different  sets  of  data.  The  more  mathematical  tests  or  analyses  are  reserved  for 
those  itans  deserving  or  requiring  more  careful  methods. 

Two  areas  can  be  noted  under  this  aspects 

(1)  ^ansformation  of  data  into  specific  associates  -  the  estimation  r 
problem 

(2)  Inferences  with  respect  to  the  objectives 

Since  I  believe  these  two  areas  are  fairly  well  appreciated  I  would  like  to 
summarize  ny  paper  by  giving  some  impressions  I  have  received  from  serving  as  j 
chairman  of  the  Statistical  Panel  of  the  AASHO  Road  Test  especially  as  they  are  ‘ 
related  to  the  general  problem  of  designing  large  scale  experiments. 

I  will  simply  itemize  these  impressions  without  comment. 

(1)  There  is  a  distinct  problem  of  going  from  the  general  purpose  of 
such  experiments  to  the  objectives  and  finally  to  the  design  and 
analyses.  This  becomes  especially  critical  when  one  considers 
the  more  economic  aspects  of  the  problem, 

(2)  The  emphasis  upon  instrumentation  development  work  as  the  end  in 
itself  in  such  experiments  needs  to  be  modified. 

(3)  The  dependent  variable  need  exists  in  most  problems. 

(It)  The  danger  of  obtaining  too  much  data  is  a  real  one. 

(5)  The  education  of  the  large  nvmiber  of  individuals  involved  in  the 
test  in  at  least  the  rudiments  of  scientific  method  is  essential. 

(6)  The .interrelationship  between  the  main  problem  and  related 
problems  need  to  be  carefully  studied. 

(7)  Committees  can  only  do  certain  things  in  eaqperimental  workj  the 
main  decision  making  must  be  left  to  one  or  two  individuals. 


Design  of  Experiments 

(8)  Large  scale  experiments  often  introduce  the  need  for  Robust 
designs  since  one  cannot  risk  the  validi-ty  of  the  results  to 
some  «high-power«  assumption. 


29 


MJLTIPLE  CORRELATION  OF  MECHANICAL  WITH 
BALLISTIC  PROPERTIES  OF  ARMOR  PLATE 

Olga  Sipes 

Research  and  Development  Group,  Metallurgy  Research  Laboratory 

Frankford  Arsenal 


Jl 


List  of  Symbols 


N 

r 


^1.25, 

^1.26, 

"*1.56 

^12.^6, 

^15.26, 

"*16,25. 

%  ^ 

z4 

E  x| 


R 


Number  of  samples 

Simple  correlation 

Partial  coefficient  correlation  for 

variables  designated  by  subscripts 

Arithmetic  Means 

Total  Variation 
Explained  variation 

Unexplained  variation 

Standard  deviation 
Standard  error  of  estimate 

Multiple  correlation  coefficient 


INTRODUCTION,  A  great  deal  of  effort  is  currently  being  expended  by 
the  personnel  of  Ifetallurgy  Research  Laboratory  at  Frankford  Arsenal  in 
an  attempt  to  find  an  adequate  specification  for  aluminum  armor  plate » 

To  do  this  by  means  of  ballistic  testing  is  costly  and  time  consumingj;: 
therefore,  the  desirability  of  finding  one  or  two  simple  mechanical  or 
metallurgical  tests  whose  resxilts  would  correlate  closely  with  the  bal¬ 
listic  test  is  obvious.  In  any  case  it  was  suggested  that  a  statistical 
analysis,  using  the  techniques  of  multiple  correlation  be  investigated 
in  order  to  provide  a  quantitative  index  of  the  relative  importance  of 
the  mechanical  properties,  singly  or  in  combination,  as  they  would  pertain 
to  the  ballistic  limits  of  an  alloy. 


In  this  particular  study  four  alxirainum  alloys,  from  the  Al-Mg  family, 
made  to  the  same  specification  but  supplied  by  different  manufactures,  were 
consideredj  and  six  mechanical  properties  were  correlated  with  the  ballistic 
limit  for  specimens  within  this  alloy  family. 


To  determine  whether  or  not  the  alloys  should  be  studied  individually 
or  as  a  group,  assuming  the  importance  of  variation  in  alloy  chemical  com¬ 
position,  the  “t-test”  was  used.  This  analysis  indicated  that  the  difference 
observed  in  the  means. of  the  ballistic  limits,  were  non-significant r 

tx^  =  0,^226  <  21  “  PRECEDING 


PAGE  blank 


32 


Design  of  Experiments 


Hence,  the  data  for  all  the  alundnum  alloys  were  treated  as  a  group  independent 
of  composition. 

The  mechanical  properties  to  be  correlated  with  the  ballistic  limit  (Xj) 
were  as  follows: 

Yield  Strength . Xg 

Ultimate  Strength . . 

Modulus  of  Resilience  . .  X, 

4 


Area  Under  Stress-Siarain  Curve 
%  Elongation  . . .  X^ 


%  Contracinon  in  Area 


\ 


To  work  with  these  six  independent- variables  would  make  the  task' extrfe^iely 
complex  and  time  consuming.  Two  methods  were  employed  to  reduce  the  number 
of  variables.  First:,  those  propeirties  which  were  related  to  each  other  or 
which  measured  similar  characteristics  were  considered  in  order  to  eliminate 
one  of  them.  Thus,  modulus  of  resilience  which  is  related  to  yield  strength 
was  eliminated.  In  turn,  of  the  two  ccanmon  measures  of  ductility,  percent 
contraction  in  area  was  eliminated  in  favor  of  percent  elongation.  The  second 
method  was  to  con^jute  the  singjle  correlation  of  each  mechanical  property  with 
the  ballistic  lind-t  and  to  confirm  the  reliability  of  these  correlations  by 
use  of  the  ”t-test”  for  ”r“  and  the  table  of  1%  and  points  for  “r"*  If  the 
correlation  was  found  to .  be  non-significant,  the  mechanical  property  under 
consideration  was  dropped.  When  the  sinple  correlation  was  fotmd  to  be  of 
borderline  significance,  as  in  the  case  of  the  tiltimate  strength,  •ttie  decision 
to  drop  or  retain  this  variable  was  based  upon  additional  considerations,  Thx{s 
the  final  decision  to  eliminate  the  xiltimate  tensile  strength  was  based  upon 
ph3rsical  reasoning  as  stipplied  by  the  metallurgist,  as  well  as  statistics. 

By  this  process  of  elimination  -ttie  pix)blem  was  reduced  to  the  manageable 
task  of  considering  three  mechanical  properties:  yield  strength,  area  under 
stress-strain  curve,  and  percent  elongation  as  the  independent  variables  to 
be  correlated  with  the  ballistic  limit  as  the  dependent  variable, 

STATISTICAL  PROCEHIRE  AND  DISCUSSION  OP  RESULTS,  It  was  convenient  to 
compute  at  one  time  all  the  values  of  the  stqns  and  product  sums  that  were  need¬ 
ed  in  different  formulae  throughout  the  work  and  arranged  them  in  tabtCLar  form, 
for  ease  of  manipulation  (Table  II),  Proceeding  from  this  point  the  relation¬ 
ship  or  simple  correlation  between  the  dependent  and  each  independent  variable 
was  determined.  One  variable  was  chosen  to  begin  the  evaluation  and  the  re¬ 
maining  variables  were  introduced  one  at  a  time  to  note  their  effect  individ¬ 
ually  and  totally  on  the  correlation.  It  was  possible  to  study  these  effects 
through  the  changes  of  the  explained  and  unexplained  variations,  the  changes 
in  correlations  and  in  the  standard  error  of  estimates.  Partial  correlation 
was  used  to  a  great  degree  to  note  these  changes  and  to  determine  the  weights 
of  each  independent  variable.  Unally  the  estimating  equation  was  derived 


33 


Design  of  Experiments 

including  the  four  variables,  also  the  standard  error  of  estimate  for  these 
variables  and  the  multiple  correlation  were  obtained.  Table  I  shows  the  data 
for  "ttie  23  specimens  of  the  alloy  under  consideration.  To  determine  the  indi¬ 
vidual  effects  of  the  three  factors  X2,  Xr  ,  and  X5  on  the  ballistic  limit, 
X,,  refer  to  Table  From  this  table  it  appears  that  elongations  is  the 

n^st  inportant  of  the  three  independent  variables.  It  has  the  biggest  e3q)lained 
squares,  esqplained  variations  and  coefficient  of  correlation  and  it  has  the 
smallest  unexplained  variations  and  standard  error  of  estimate.  Figure  I  shows 
the  scatter  diagram  of  the  sin5)le  relationship  between  the  ballistic  limit  and 
each  of  the  independent  variables,  the  lines  of  best  fit  with  a  band  of  plus 
or  minus  one  standard  error  of  estimate,  and  the  coefficient  of  simple  corre¬ 
lations  are  also  shown  at  the  bottom.  From  this  figure  it  appears  that  per 
cent  elongations  is  the  most  inportant  factor::  the  slope  of  the  line  approaches 
minus  one,  it  has  the  largest  simple  correlation  and  the  values  are  more  con¬ 
centrated  around  the  line  of  best  fit  indicated  by  the  narrower  band  or  smaller 
standard  error  of  estimate.  Area  under  the  stress -strain  curve  is  about  the 
same  or  of  slightly  lesser  importance  than  %  elongation  and  yield  strength  the 
least  important.  At  this  point  it  is  teupting  to  say  •‘Why  go  any  further,  we 
have  established  the  important  variables,  what  more  do  we  want?"  But  this  rank¬ 
ing  does  not  necessarily  hold  true  idaen  other  variables  are  introduced.  The 
problem  is  to  determine  that  it  does. 

The  ceGLCulation  used  yield  strength  as  the  first  factor,  then  %  elongation 
and  area  under  the  stress-strain  curve  were  considered  in  the  order  named. 

More  inforaiation  is  obtained  by  a  careful  study  of  Figure  2,  Section  A  indi¬ 
cates  deviations  of  from  their  mean,  that  is,  the  total  deviations;  while 
section  B  shows  the  deviations  in  the  estimates  of  the  ballistic  limit  from 
their  mean,  that  is  the  individual  explained  variations'.  Roughly,  a  small 
number  of  the  bars  in  section  B  appear  about  the  same  as  those  in  section  A, 
Section  G  indicates  the  individual  variations  that  have  not  yet  been  accounted 
for,  that  is,  the  deviations  of  the  actual  ballistic  limit  values  from  the 
estimated  values.  Inspection  of  the  bars  in  section  C  will  roughly  verify  the 
magnitude'  of  unexplained  variations  or  "residuals**  (since  they  are  obtained  for 
each  sample  by  algebraically  subtracting  the  value  of  estimate  from  the  actual). 

In  general,  the  bars  in  section  C  are  smaller  than  •tiiose  in  section  A, 
but  there  are  exceptions.  There  is  yet  to  explain  why  'ttie  ballistic  limit  is 
so  low  in  ssmple  12,  13,  moderately  low  in  19,  21  and  32,  and  so  high  in  11  and 
moderately  high  in  2,  15,  and  30,  Some  clue  to  this  difficulty  is  given  by 
reference  to  Figure  3*  In  each  section  of  this  figure  the  dependent  variable 
is  the  individual  unexplained  variations  in  the  ballistic  limit  which  are 
taken  from  section  C  of  Figure  2,  Section  A  (Figure  3)  shows  samples  12  and 
32  with  large  and  moderately  large  negative  residual  in  the  ballistic  limit 
but  a  hi^  reading  in  area  tinder  the  stress -strain  curve.  Sample  11  is  veiy 
hi^  with  respect  to  positive  residual  and  moderately  low  in  its  reading  of 
area  under  the  stress -strain  curve  and  the  latter  for  sample  19  appears  to  be 
moderately  above  average  but  low  in  residual.  Section  B  shows  that  samples 


*  Tables  and  Figures  are  placed  at  the  end  of  this  article. 


3h 


Design  of  Ebqseriments 


12  and  19  are  low  in  residual  for  %  elongation  also,  and  sample  11  shows 
a  very  high  residual  but  is  closer  to  the  average  in  %  elongation^ 


From  an  examination  of  the  two  sections  of  Figure  3  it  appears  that 
area  under  stress-strain  curve  and  %  elongation  have  approximately  the  same 
effect  on  the  ballistic  limit.  It  may  be  possible  to  reduce  the  errors  in 
the  estimate  and  ia^rove  the  correlation,  by  including  one  or  the  other 
as  a  second  factor.  Percent  elongation  was  taken  as  the  second  factor 
since  its  values  are  a  little  more  concentrated  around  the  mean,  (r^^^  is 
slightly  larger  than  r^^^) . 


The  results  of  some  of  the  calculations  using  two  independent  variables 
are  shown  in  Figure  U.  Section  A  is  the  same  as  Figure  2  but  in  section 
B  the  line  lengths  havcpincreased.  Statistically,  the  explained  variation 
has  increased  from  E  Xci.6  =  71,031  to  E  ^1,26  =  100,ii50.  The  increase 
in  the  explained  variation  is  approximately  29,^00.  Since  tbis  increase 
is  larger  the  coiu’elation  increased  considerably  from  r-^  p  =  0.637  ^  ^  26  “ 
but  the  increase  from  E  x§i,5  “  97*623  to  E  xci,26  large  C2830), 

hence  the  increases  from  r^^^  =  -0.7uli  to  1^.26  "  0*757  is  small.  Likewise 
the  unexplained  variation  in  section  C  has  fieen  reduced.  Correspondingly, 
the  standard  error  of  estimate  a  ®i,26  **  57*01  ft/sec.  has  declined  from 
o  3,5  =  67,3  ft/sec.  and  slightly  declined  from  a  *=  ^8.01  ft/sec. 

Figure  5  shows  the  scatter  diagram  of  values  for  area  under  stress-strain 
curve  adjusted  for  yield  strength  and  %  elongation.  In  this  figure  the 
ballistic  limit  is  considerably  below  the  estimate  for  sample  13  and  far 
above  for  sanple  11  and  moderately  above  for  30;  however,  these  samples 
are  close  to  the  average  value  of  area  under  the  stress-strain  curve  for 
the  group.  It  remains  to  be  seen  whether  area  under  stress-strain  curve, 
as  such,  is  an  important  explanation  of  the  ballistic  limit.  In  reference 
to  Figure  7,  note  that  as  the  coefficient  of  multiple  correlation  becomes 
larger  the  standard  error  of  estimate  becomes  smaller.  The  R»s  which,  in¬ 
clude  %  elongation  as  a  factor  are  the  larges  and  the  013  which  include 
this  factor  are  the  smallest.  From  this  calculation  it  appears  that  the 
variation  in  the  ballistic  Umit  is  influenced  more  by  percent  elongation 
than  by  the  area  under  the  stress-strain  curve  and  by  yield  strength  last. 


Before  introducing  the  last  independent  variable,  (area  under  stress- 
strain  cui^e),  the  partial  correlation  was  considered  to  learn  whether  the 
relative  importance  of  the  different  independent  variables  in  explaining 
variations  in  the  dependent  variable  was  the  same  as  already  found  by  mul¬ 
tiple  correlation.  This  was  done  by  finding  the  extent  to  which  correlation 
was  increased  by  addition  of  another  factor.  By  definition,  partial  cor;- 
relation  is  a  measure  of  the  relationship  between  the  dependent  variable 
and  one  independent  variable,  when  the  influence  of  the  other  independent 
variables  have  theoretically  been  removed  from  both.  More  precisely  it  is 
the  square  root  of  the  ratio  between  the  increase  in  the  variation  of  the 
computed  values  of  the  dependent  variable  resxilting  from  introducing 
another  variable,  and  the  variation  that  has  not  been  e^qslained  before 
the  introduction  of  the  new  factor. 


Design  of  Experiments 


35 


One  of  the  formulas  for  partial  correlation  is: 


^12.6 


The  values  of  the  partial  correlations  found  are  as  follows: 


'‘12.6  ~ 

■'16.5  = 

■•12.5 ' 

"15.6= 

ri5_2  -  -0.463 

■•16.2  “  -0-531 

It  might  be  thought  that  as  additional  factors  were  held  constant  the 
dependent  variable  would  become  progressively  less  closely  associated  with 
a  given  independent  variable.  For  instance,  the  simple  correlation  between 
the  ballistic  limit  and  yield  strength  was  found  to  be  r,2  =0»637,  but 
when  the  area  under  stress-strain  curve  and  5^  elongation  w^ere  each  brought 
into  the  picture  (technically,  when  the  ballistic  limit  and  the  yield  strength 
were  adjusted  for  variation  in  area  under  stress-strain  curve  and  elongation) 
**12  5  0.069  and  r,2  ^  was  -0.194*  What  appeared  to  be  a  relationship 

between  yield  strengtn  &nd  ballistic  limit  was  in  fact  largely  a  relationship 
between  area  under  stress-strain  curve  and  ballistic  limit,  and  between 
ballistic  limit  and  %  elongation.  The  other  partial  coefficients  were  inter-  ■ 
preted  analogously.  The  relationship  between  ballistic  limit  and  area  under 
stress-strain  curve,  and  between  ballistic  limit  and  %  elongation  exists  to 
a  much  larger  degree  than  that  between  ballistic  limit  and  yield  strength. 

Thus  it  appeared  that  the  Independent  variables,  taken  together,  produced  a 
partial  correlation  that  was  fairly  constant  from  sample  to  sample.  There 
were  some  exceptions  to  this  statement  notably  sample  11, 

The  results  of  the  computation  of  partial  correlation  led  to  the  same 
conclusions  as  did  multiple  coefficients:  that  %  elongation  was  more  closely 
related  to  the  ballistic  limit  than  either  area  under  the  stress-strain  curve 
or  yield  strength,  and  that  of  the  latter  two,  area  under  stress-strain  curve 
was  more  influential  than  yield  strength.  The  results  of  the  computation 
using  two  independent  variables  are  shown  in  Table  IV. 

It  remained  to  be  seen  whether  the  conclusions  concerning  the  relative 
importance  of  the  three  independent  variables  remained  the  same  when  all 
four  were  considered  simultaneously,  rather  than  as  different  combinations 
of  three  variables. 

Having  added  the  area  under  stress-strain  curve  into  the  calculation 
it  was  .noted  that  the  explained  and  xmexplained  variations  did  not  change 
Very’ much  from  the  value  obtained  with  yield  strength  and  %  elongation. 
Mathematically  Ex  =  102,584  and  Ex  =  100,450;  their 

°1.256  °1.26 

difference  is  2160,  The  values  for  the  partial  correlation  were: 


36 


Design  of  Experiments 


'•12.56  '  ’^16.25  °  '‘15:26  “ 

Figure  7  is  illustrative  of  progress  made  thus  far.  It  may  be  noted 
that  the  coefficient  of  multiple  correlation,  ~  0»76l4,  steadily 

became  larger  and  the  s'tandard  error  of  estimate,  '%1,256  “  ft/sec. 

steadily  became  smaller.  By  substituting  the  numerical  values  obtained 
for  the  constants  in  the  nomal  equations  computed  by  the  method  of  least 
squares  and  solved  by  Doolittle’s  technique,  the  estimating  equation  became: 

\.2$6  "  *  ^12.^6^2  \6,2^6 

=  28^9  -.CX)567X^  ♦  ^027373^  -57.U60X^ 

To  test  the  significance  of  the  multiple  correlation  the  analysis  of 
variance  was  xised.  The  F  table  indicated  that  for  the  ,001  level  of  sig¬ 
nificance  and  with  3  and  19  degrees  of  freedom  (n^  and  ng),  F  should  equal 

*8.28*  Since  "the  computed  value  for  F  (8*95)  was  larger  than  the  -tabular 
value,  it  can  be  naid  that  is  significant. 

Finally,  Figure  6  shows  at  this  point  that  the  addition  of  the  area 
factor,  has  not,  on  the  whole,  improved  the  estima-te  very  much  as  compared 
with  the  first  two  independent  variables  used. 

SUMMARY*  From  the  foregoing  discussion  it  may  be  concluded  that  it 
would  have  been  sufficient  -bo  work  with  two  independent  variables  only: 
yield  strength  and  %  elongation  or  yield  strength  and  area  under  the  stress- 
strain  curve,  since  the  latter  and  per  cent  elongation  show  practically  the 
same  effect  on  the  ballistic  limit* 

For  this  particular  set  of  experimental  data  and  statistical  treat¬ 
ment  the  multiple  correlation  of  0.76ii.  indicates  -that,  there  is  relation¬ 
ship  among  the  mechanical  properties  considered  and  the  ballistic  limit, 
and  that  about  6Q5S  of  the  variance  in  the  ballis-tic  limit  is  accounted 
for  by  the  three  factors  considered.  Iforeover,  if  the  readings  of  the 
mechanical  properties  of  each  sample  are  multiplied  by  the  optimum  . 
weight  factor  indicated  by  the  partial  regression  coefficients  in  the 
estimating  equation,  these  readings  would  predict  a  ballistic  limit 
within  plus  or  minus  56.3  ft/sec.  about  68?6  of  the  time,  and  within 
plus  or  minus  11?. 7  ft/sec,  about  95*3^  of  the  time. 

Very  much  of  the  remaining  unexplained  variation  could  be  due  to 
the  variability  of  the  ballistic  test.  The  standard  error  of  56.3  ft/sec , 
obtained  above  is  not  far  from  the  variation  normally  observed  in  ballistic 
testing. 


3 


TABLp  I 


Ai*0A 

Untar  StiYse-Strawi  €«*rY« 

%  Elongation  (Zg) 

2 

2320 

38200 

U492 

12.0 

7 

2240 

38500 

11218 

12.2 

10 

2140 

24600 

15273 

18.0 

11 

2400 

35200 

11053 

12.5 

12 

2080 

33100 

U634 

16.7 

13 

2120 

39600 

11682 

12.8 

15 

2280 

35400 

12152  . 

12.8 

16 

2235 

33600 

11402 

12.6 

17 

2240 

34200 

132U 

'15.2 

19 

2140 

35300 

12334 

U.6 

20 

2345 

46800 

7633 

7.9 

21 

2180 

37400 

11710 

13.4 

22 

2215 

39600 

11026 

12.5 

30 

2345 

42200 

11090 

U.6 

31 

2090 

24600. 

U527 

17.5 

32 

2125 

3aoo 

13871 

15.4 

33 

^2220 

37200 

12659 

13.9 

34  ■ 

29600 

13046  . 

U.4 

35 

2220 

38600 

10630 

U.6 

36 

2275 

37000 

11146 

U.8 

37 

2310 

47100f 

9132 

9.0 

4$ 

2330 

45000 

10414 

10.0 

46 

2230 

41700 

9437 

10.1 

*  Eaoh  YaluA  la  eoluno  Mte  is  obtained  bgr  taking  the  average  of  three  hlgheat  partial  and  three 
lowest  eoeipiiete  penetrati^  with  a  grouping  of  six  falling  within  12$  Ft/seo. 


#  •••  •  ••• 

•  •  •  ••  9 

•  •  •  •  •• 

•  •  •  •  • 


•  • 


•  ••• 

•  ••  • 
•  9  9 

999  999 


»  •• 
•  •  i 
9  •• 


•• 

t  •  • 

•  •  •  ••• 

•  •  ••• 


•  • 
t  « 
•  • 

»  « 
99 


tiSaS  II.  ComptttaUen  eC  DurUtlwi  froduot  Sam  HtqtOPtd  far  NiBsara*  of  Salatloufalp  betwwi  BilUsUo  Halt  aiid  Strwagth,  Area  under 
Strea-Strala  Curre  ud  Per  Cent  Soacatlani  of  23  Senile*  of  Hiarttiwa  iner  Hate 


38 


Fleur*  2, 


«  *8  S 

m  m  p 

%  s  B 


p  na  m 

0  H  ^ 


^  0  ^  44 

i  7}  ^  d 

a  ►  K  > 


Tan„-„  - - : - ! - - - • - > - 1 - 1 - 1 - l-,-l _ I _ I _ I  I  I  I _ L_  t 


Flgur«  4‘ 


Scatter  diagram  of  area^(X^)  and  values  adjusted 
for  yield  strength  and  percent  elongation 


Devl«tlon»  of  -I^  values  ftOB  their  ateea  Deviations  from  their  of  computed  Xj  values  Deviations  of  I^values  from  computed  values 

based  upon  estimating  aquation  using  yield 


Figure  7b. 


As  Um  number  or  varlablea  Increase  the  coefficient  of  correlation  Increases 


Figure  7 


ANALYSIS  OF  CATHODE  INTEEFACE  RESISTANCE  EXPERIMENT^ 

M«  H.  Zinn 

U.  S.  Anny  Signal  Research  and  Development  Laboratory 

At  the  Third  Conference  on  Design  of  Experiments  the  general  problem  of 
electron  tube  experiments  was  discussed  in  a  clinical  paper. ^  discussion 

illustrated  with  a  particular  experiment  concerning  the  study  of  cathode 
interface  resistance  growth  during  life  of  receiving- type  electron  tubes.  It 
is  the  purpose  of  this  paper  to  review  the  experimented,  design  and  discuss  the 
statistical  techniques  utilized  in  the  analysis  of  the  data. 

Cathode  interface  resistance  is  caused  by  the  formation  of  a  layer  at  the 
interface  between  the  barium-oxide  coating  and  the  nickel  base  of  an  oxide- 
coated  cathode.  The  layer  is  formed  by  a  chemical  reaction  between  the  barium 
oxide  emd  impurities  in  the  nickel »  such  as  silicon,  magnesium,  mangeinese, 
aluminum,  tungsten,  etc.  Silicon  impurities  react  to  form  barium-orthosilicate, 
which  is  considered  by  many  workers  in  the  cathode  field  to  be  responsible  for 
the  high  resistance  type  of  layer.  The  growth  of  the  layer  is  influenced  by 
the  temperature  of  operation  of  the  cathode  and  the  conditions  of  operation  of 
the  tube.  The  experimental  design  set  up  to  test  the  effects  of  these  various 
factors  and  the  influence  of  different  manufacturing  processes  are  shown  in 
Figure  1.  As  can  be  seen,  a  complete  factorial  design  was  used.  Foiu:  types 
of  nickel  alloy  were  selected  for  the  test,  and  a  quantity  of  each  alloy 
selected  from  a  particular  melt  was  sent  to  a  single  cathode  manufacturer  to 
be  formed  into  cathode  sleeves.  The  finished  cathode  sleeves  were  then  divided 
among  three  tube  manufacturers  who  used  them  in  the  construction  of  a  common 
tube  type  selected  for  the  test.  Each  alloy-manufacturer  lot  was  tested  at 
three  levels  of  filament  voltage  corresponding  approximately  to  three  levels 
of  cathode  temperature  and  three  levels  of  plate  current.  The  sample  sizes 
or  number  of  replications  of  the  experiment  was  chosen  based  on  the  equation 

)  o 


for 

and 

where 

and 


n  =  (1.645  +1.960)  I 

3  =  .05 

a  =  .05 

o  = 

e 

2 

o  =  residual  variance  of  a  homogeneous  group 

5  =  desired  minimum  resolution  between  two  groups 


Work  carried  out  for  Signal  Corps  by  Briggs  Associates,  Inc.  under 
contract  DA56-059  sc -72556. 

Problems  in  Analysis  of  Electron  Tube  Experiments  -  M.  H.  Zinn. 

preceding  PAGE  blank 


50 


Design  of  Experiments 


A  ratio  of  1.5  between  means  on  an  arithmetic  base  was  arbitrarily  selected  as 
the  minimum  resolution  desired  yielding  a  value  of  O.I76  for  S  on  a  logarithmic 
base.  An  estimated  value  of  0.558,  based  on  a  limited  amount  of  data,  was  used 
for  the  standard  deviation  of  log  interface  of  a  homogeneous  group.  This  result, 
in  a  minimum  sample  size  of  107  tubes  required  to  detect  a  significant  differenoi 
between  two  homogeneous  groups.  Since  this  sample  size,  if  it  were  assigned  to 
each  individual  cell  in  the  factorial  experiment,  would  lead  to  a  huge  number  of 
tubes  to  be  tested  over  a  period  of  5000  hours  each,  a  compromise  was  reached  by 
using  this  number  as  the  approximate  size  of  an  alloy-manufacturer  group.  Actu¬ 
ally  15  tubes  were  tested  under  the  9  conditions  of  operation  for  a  total  of  II7 
tubes  for  each  silloy-manufacturer  group.  This  resulted  in  a  total  of  I56  tubes 
under  each  of  the  9  life  test  conditions. 

Life  tests  were  initiated  in  accordance  with  this  experimental  design  with 
the  modification  that  two  burning  runs  were  made.  The  first  burning  run  was  mad( 
with  7  tubes  placed  on  life  under  each  cell  condition  for  5000  hours.  At  the  eni 
of  this  period  6  tubes  were  placed  on  life  for  another  5000-hour  period  to  con¬ 
stitute  the  second  run. 

While  the  life  hours  were  being  accumulated,  the  opportunity  was  provided 
to  study  the  problems  involved  in  the  analysis  of  the  data.  Three  problems  were 
considered  to  be  of  great  importance;  one  problem  is  peculiar  to  this  experiment, 
while  the  other  two  problems  are  common  to  most  electron  tube  e:iq>eriments  In¬ 
volving  life  tests. 

The  first  problem  resulted  from  the  choice  of  a  twin  triode  as  the  test 
vehicle.  (A  twin  triode  consists  of  two  triode  sections  each  with  individual 
cathodes  in  the  same  envelope.)  If  the  two  sections  of  each  tube  behaved  as 
independent  samples  from  the  same  tube  population,  the  effective  size  of  the 
sample  would  be  doubled,  A  doubling  of  the  sample  size  would  tend  to  minimize  | 
the  effects  of  higher  order  interactions  than  the  two-way  interactions  for  1 
which  we  had  made  provisions  to  detect,  if  present,  in  the  experimental  design.  | 
It  was  possible,  however,  that  the  close  environmental  conditions  of  two  triodesj 
in  the  same  envelope  would  cause  a  miri^ring  effect,  which  would  result  in  such  j 
small  differences  between  samples  that  one  would  not  be  justified  in  using  the  j 
sections  as  individual  replications.  In  addition,  a  third  possibility  existed  | 
that  there  would  be  some  bias  existing  between  the  sections  because  of  methods 
of  processing  or  due  to  fixed  filament  voltage  differences  that  would  require 
the  sections  to  be  treated  as  two  separate  groups.  This  would  add  a  new  factor 
to  the  experiment,  which  could  result  in  additional  interactions  to  weaken  the 
power  of  the  analysis.  The  solution  to  this  problem  is  discussed  below. 

A  second  problem  considered  prior  to  the  collection  of  the  complete  data 
was  methods  of  overcoming  variance  and  drift  in  the  true  levels  of  real  factors 
that  could  not  be  adequately  controlled.  In  this  experiment  this  problem  was 
due  to  the  inability  to  directly  control  the  cathode  temperature,  which  repre¬ 
sents  the  real  variable  affecting  the  growth  of  interface  rather  than  the 
filament  voltage,  which  could  be  controlled.  The  effects  of  this  lack  of 
control  would  mean  a  wider  spread  in  residual  variance  thaui  would  otherwise 
be  presei^t.  No  solution  was  found  for  this  problem  during  the  course  of  the 
experiment,  and  this  contribution  to  the  residual  error  had  to  be  accepted. 

The  third  problem  was  the  method  of  treatment  of  readings  at  various 
periods  during  life.  Should  these  be  treated  as  another  factor  in  the  analysis 


51 


Design  of  Experiments 

of  variance  or  should  other  methods  of  treatment  be  utilized?  This  point  vid.ll 
also  be  touched  on  later  in  the  discussion o 

In  performing  the  actual  analysis  of  the  data,  a  standard  analysis  of 
variance  was  performed.  The  problem  of  the  treatment  of  the  sections  was 
resolved  by  considering  them  as  another  factor  in  the  analysis,  as  advised 
by  professor  Hartley  at  the  Third  Conference  on  Design  of  Experiments,  with 
the  added  feature  that  a  two-sided  test  of  the  F  ratio  was  to  be  made  for  the 
sections  rather  than  the  usual  one-sided  test  for  significantly  large  differ¬ 
ences  in  variance.  The  inclusion  of  the  section  factor  in  the  analysis  of 
variance  and  a  run  factor  due  to  the  fact  the  cell  lots  were  divided  approxi¬ 
mately  in  two  resulted  in  the  following  overall  analysis  requirements; 


vrhere 


or 


(ESAMIE)  Replications 

R  =  Runs 
S  =  Sections 
A  =  Alloys 
M  =  Manufacturers 

I  =  Plate  Current  Operating  Conditions 
E  =  Filament  Voltage  Operating  Conditions 

(2  X  2  X  4  X  3  X  3  X  5)7  =  302k 


readings  to  be  analyzed  for  each  of  10  reading  periods  taken  during  the  3000- 
hour  test. 

A  search  was  instituted  to  find  a  machine  program  that  could  handle  this 
number  of  factors.  It  was  determined  that,  even  though  each  alloy  group  was 
analyzed  separately,  which  appeared  to  be  desirable  based  on  initial  analyses 
showing  large  differences  between  alloys,  and  each  run  was  handled  separately 
with  the  run  analysis  done  by  manual  methods  at  a  later  time,  the  cost  of 
programming  the  remaining  SMIE  four-factor  analysis  was  higher  than  our  budget 
could  handle.  Further  study  of  the  problem  by  Mr.  R.  Dickson,  the  statistician 
on  the  program,  indicated  that  the  analysis  could  be  carried  out  completely  on 
a  manual  basis  using  statistical  clerks  to  perform  the  calculations,  provided 
that  the  manual  program  was  organized  properly.  If  the  program  was  handled 
on  this  basis,  it  would  be  possible  to  remain  within  the  costs  budgeted  for 
the  analysis  and  obtain  the  desired  results.  The  manual  analysis  vrould  also 
make  it  possible  to  perform  additional  graphical  treatment  of  the  data,  since 
all  of  the  subtotals  v/ould  be  automatically  available,  compared  to  having  to 
pay  for  these  subtotals  in  programming  and  machine  time,  if  automatic  machine 
calculations  were  used.  The  lower  cost  of  the  manual  program  is  made  possible 
only  by  being  able  to  stop  the  calculation  process  at  appropriate  points  and 
®ake  a  decision  based  on  preliminary  plotting  of  average  values  that  there  is 


52 


Design  of  Experiments 


nothing  to  be  gained  by  continuing  the  calculation.  Thus,  all  of  the  early- 
life  readings,  where  a  simple  plot  of  the  data  could  show  that  there  were  no 
significant  differences,  did  not  need  to  be  put  through  a  complete  cinalysis 
of  variance. 

The  organization  of  the  manual  program  was  based  on  the  use  of  a  three-way 
table  such  as  is  illustrated  in  the  Appendix.  A  step-by-step  procedure  for  using 
this  table  is  also  included.  As  can  be  seen,  the  table  covers  a  three-factor 
analysis  of  Section,  Plate  Current,  and  Filament  Voltage  (SIE)  effects  for  a  | 
single  manufacturer  and  a  single  alloy  and  covers  the  first  run  of  seven  tubes  | 
out  of  the  13- tube  sample  per  cell.  Similar  tables  were  used  to  enter  the  com-  I 
bined  ceaculations  for  the  two  runs  and  then  to  combine  the  results  of  tests 
for  the  three  manufacturers.  No  attempt  was  made  to  combine  the  results  of 
different  alloys  since,  as  previously  mentioned,  the  differences  between  alloys 
were  large. 

It  should  be  noted  that  the  data  shown  in  the  table  included  in  the 
Appendix  represent  early  calculations  that  were  performed  when  the  organization 
of  the  data  was  being  worked  out.  At  that  time  the  data  were  carried  in  the 
averages  to  only  three  decimal  places,  which  resulted  in  negative  values  being 
calculated  for  some  of  the  variance  estimates.  This  was  corrected  in  the 
later  analyses  where  the  sums  and  averages  were  carried  out  to  five  decimal 
places,  thus  eliminating  the  calculation  error  resulting  in  fictitious  nega¬ 
tive  values  of  variance.  The  method  is  open  to  criticism  in  that  use  is  made 
of  averages  at  early  stages  of  the  calculation  rather  than  in  carrying  partial 
sums.  The  use  of  the  procedure  can  be  justified  on  the  basis  that  the  numbers 
carried  through  the  procedure  are  relatively  simple,  with  an  ordered  level  of 
magnitude.  This  permits  a  relatively  untrained  calculator  to  spot  his  own 
calculation  errors  or  transpositions  of  entries  as  he  goes  along,  and  it  sim¬ 
plifies  the  checking  procedures.  It  also  simplifies  the  treatment  of  missing 
entries  caused  by  failure  of  a  tube  for  reasons  other  than  interface  resistance 
prior  to  a  reading  period.  These  failures  were  few  enough  in  number  to  permit 
use  of  the  section-burning-condition  average  value  for  the  missing  tube,  thus 
allowing  a  constant  sample  size  to  be  used  for  variance  estimates.  The  advan¬ 
tages  more  than  offset  the  error  introduced  by  taking  premature  averages.  It 
is  possible,  however,  to  use  J^he  same  basic  organization  and  carry  total  sums, 
if  one  so  desires. 

The  data  resulting  from  the  analysis  of  variance  will  be  presented  in  a 
final  report  on  the  progreun.  It  will  be  in  tabular  and  graphical  fora.  The 
methods  used  for  presentation  of  these  results,  rather  than  the  results  them¬ 
selves,  should  be  of  interest  to  this  audience.  The  tabular  summary  will 
contain  the  calculated  variances  for  each  of  the  main  effects  and  interactions 
and  the  residual  variance  due  to  error  or  imcontrolled  effects.  The  average 
residual  variance  for  the  overall  experiment  was  determined  to  be  O.O85  on  a 
logarithmic  base.  The  measured  value  of<^g>  was,  therefore,  0.292  compared  to 
the  estimated  value  of  0.358  used  to  calculate  the  satSple  sizes.  The  results 
indicate  that  the  planned  statistical  power  of  the  experiment  to  detect  dif¬ 
ferences  between  alloy-manufacturer  samples  equivalent  to  an  arithmetic  ratio 
of  1.5  v/as  obtained.  Significant  differences  between  smaller  samples  repre¬ 
senting  the  individual  cell  groups  were  detected  because  the  differences  that 
arose  due  to  the  test  conditions  were  in  many  cases  greater  than  the  niinimuin 
detectable  limit  selected. 


Design  of  Experiments  53 

In  addition  to  the  tabular  summary,  the  data  have  been  presented  in 
various  graphical  fonns.  Figure  II  is  illustrative  of  one  of  the  methods  of 
oresenting  the  life  data.  It  is  a  plot  of  the  results  of  the  four  alloys 
for  one  manufacturer.  The  final  presentation  vdll  include  an  individual  plot 
for  eaoh  manufacturer  and  alloy  of  the  average  curve  similar  to  each  of  these  • 
curves  with  one  sigma  limits  for  the  grand  average  of  all  burning  conditions 
for  the  sample  size  used  and  one  sigma  limits  for  the  average  of  burning 
conditions.  Note  the  rather  large  differences  in  alloys  that  were  experienced; 
these  resulted  in  the  alloy  analyses  being  handled  separately. 

Figure  III  also  shows  life  data.  In  this  figure  curves  are  shown  for 
l^hree  of  the  four  alloys  for  all  three  manufacturers.  The  curves  for  alloy 
220  have  been  omitted  to  eliminate  confusion  since  they  would  fall  close  to 
the  330  alloy  curves.  Note  the  large  alloy-manufacturer  interaction  that  is 
apparent  even  without  resorting  to  analysis  of  variance.  The  P-50  alloy  shows 
no  significant  differences  between  manufacturers  while  the  A-32  shows  a  pro¬ 
nounced  difference  all  through  life.  The  330  alloy,  however,  only  shows  a 
significant  difference  at  the  later  stages  of  the  5000-hour  tests. 

Figure  IV  illustrates  the  method  used  to  graphically  show  the  effects  of 
the  burning  conditions.  These  plots  have  been  made  for  individual  alloys  and 
for  all  manufacturers,  x/hen  no  significant  manufacturer  effects  were  present, 
or  for  each  manufacturer  when  required.  The  crosses  represent  the  burning 
conditions  of  filament  voltage  and  plate  current.  The  figure  in  parentheses 
represents  the  average  value  expressed  in  arithmetic  values  of  interface 
resistance.  Contour  lines  have  been  drawn  onto  this  matrix  showing  the 
placement  of  a  contour  corresponding  to  the  average  of  all  of  the  burning 
conditions  and  contours  corresponding  to  one  sigma  limits  due  to  the  residual 
error  of  the  test.  A  contour  plot,  such  as  represented  by  this  figure,  for 
the  P-50  alloy  and  all  manufacturers,  showing  a  very-flat  topographical 
structure,  is  indicative  of  no  significant  effects  due  to  the  various  levels 
of  burning  conditions. 

Figure  V,  covering  alloy  220  and  manufacturer  1,  shows  a  little  more 
topographical  structure.  The  presence  of  more  contour  lines  between  burning 
points  indicates  that  some  significant  effects  are  present.  If  all  of  the 
contour  lines  were  straight  lines  essentially  parallel  to  each  other,  this 
would  mean  that  a  main  effect  was  present.  With  the  curvature  as  shown, 
it  is  indicative  that  a  voltage- current  interaction  is  present,  raising  a 
slight  but  significant  hill  at  the  5.7-volt,  0,9-milliampere  condition. 

Figure  VI  illustrates  the  interaction  effect  even  more  graphically, 
showing  data  for  alloy  A-32,  manufacturer  1,  The  number  of  contour  lines 
has  increased  considerably,  showing  highly  significant  differences  existing, 
with  a  severe  interaction  effect  as  shown  by  the  large  curvature,  plus  a 
significant  plate  current  effect.  The  presentation  of  contour  data  for  differ¬ 
ent  periods  of  life  for  the  same  alloy  would  show  a  general  lifting  of  the 
elevation  levels  at  all  points  with  a  hill  beginning  to  appear  near  middle 
life  and  a  shift  in  the  apex  of  the  hill  as  life  progresses.  The  contour  maps 
thus  represent  a  rather  graphic  moving-picture  of  the  life  history  of  the  effect 
of  burning  conditions  on  the  growth  of  interface  resistance. 


54 


Design  of  Experiments 


The  experiment  revealed  many  effects  that  had  not  previously  been  suspected 
one  of  which  might  be  of  interest  to  those  of  you  involved  in  the  operation  of 
electron  tube  ccanputers.  This  effect  showed  that  the  cut-off  condition,  zero 
plate  current,  is  not  necessarily  the  worst  condition  of  operation  of  electron 
tubes  as  far  as  interface  resistance  growth  is  concerned.  The  cure  for  the 
so-called  “sleeping  sickness”  of  tubes  in  conputers  operating  for  long  periods 
of  time  at  cut-off,  by  maintaining  a  low  current  drain,  is  not  necessarily  the 
best  action  to  take  since  it  has  been  demonstrated  that  extremely  high  values 
of  interface  can  be  formed  at  the  low  current  drain  conpared  to  high  current  or 
zero  current  for  some  alloys.  A  surer  cure  of  the  problem  is  to  use  tubes 
with  passive  alloys  equivalent  to  the  P-50  alloy  tested  in  this  experiment  and 
obtain  relatively  low  values  of  interface  resistance  over  the  life  of  the  tube 
and  freedom  from  the  effects  of  operating  conditions. 

The  successful  conclusion  of  this  experiment  is  due  in  large  measure  to 
the  use  of  the  statistical  approach  to  the  Design  of  Experiments.  The  results 
obtained  are  conclusive  in  the  areas  covered  by  the  experimental  design. 

Those  points  that  are  missing,  such  as  the  effects  of  sampling  within  a  given 
manufacturer’s  production  over  a  period  of  time,  can  now  be  obtained  with  a 
fairly  simple  experimental  design.  The  contour  data  obtained  for  the  burning 
effects  need  checking  both  as  to  the  reproducibility  of  the  data  for  the 
operating  conditions  used  for  these  tests  and  the  interpolation  of  the  data 
between  points,  A  partial  factorial  design  covering  an  experiment  to  obtain 
this  additional  data  is  presently  being  examined.  While  the  debt  owed  to  the 
field  of  statistics  is  great,  it  is  hoped  that  at  least  a  partial  repayment 
has  been  made  to  this  field  through  the  techniques  of  analysis  evolved  during 
the  program. 

ACKNOtiLEDCMJTS 


The  author  wishes  to  thank  the  following  personnel  of  Briggs  Associates, 
Inc. :  R.  Dickson,  who  developed  the  statistical  techniques  discussed  in  this 
paper,  and  T.  A,  Briggs  for  making  available  the  figures  used. 


INTERFACE  RESISTANCE  -  OHMS 


Fig  I  DESIGN  OF  LIFE  OPERATING  CONDITIONS 

HEATER  VOLTAGES 


5.  7  V  OC 


i'R 


6.3  V  OC 

BAl-1  TWIN  TRIODES 


6.9vac 


PLATE  CURRENT 


— 

P50 

220 

330 

A32 

CATHODE  MATERIALS 
USED  IN  LIFE  LOTS 

REPEATED  FOR  EACH 
OF  9  LARGE  SQUARES 

P50 

220 

330 

A32 

P50 

220 

330 

A32 

52  = 

13 

13 

13 

13 

TUBE  QUANTITIES  USED 
IN  LIFE  LOTS  (  =  156) 

REPEATED  FOR  EACH 

OF  9  LARGE  SQUARES 

13 

13 

13 

13 

13 

13 

13 

13 

39 

TOTAL  TWIN  TR 

_ 

■ 

I0DES=1404  TUBES 

9.0  m  o 


0,9  mo 


O  mo 


Fig.  I  -  INTERFACE  RESISTANCE  AS  A  FUNCTION 
OF  CATHODE  ALLOY 

9AI-1  TRIODES  MFGR  1  ALL  LIFE  CONDITIONS 


NTERFACE  RESISTANCE-  OHMS 


FJg  HI  -  INTERFACE  RESISTANCE  AS  A  FUNCTION 
OF  ALLOY  AND  MANUFACTURER 


Flij.  12- INTERFACE  READING  CONTOURS -TRIODES 

ALLOY  P-50  ALL  MANUFACTURERS 
2000  LIFE  HOURS 


^  0.616  0.616 

a  lOG=  ->  ---  =  — -r-  =  0.0726  (18%  READING  SHIFT) 
y72  ° 

preceding 


PAGE  BLANK 


PLATE  CURRENT  (MA) 


59 


FIfl-  3Z:  -  INTERFACE  READING  CONTOURS-TRIODES 

alloy  220  MANUFACTURER  T 
5000  LIFE  HOURS 

(138)  (12.6)  {(43) 


cr  Loe  » 


.331 

'JZA 


FILAMENT  VOLTAGE 
.331 


4.9 


•  0.0676  (17%  READING  SHIFT) 


Fig.  30:  -INTERFACE  READING  CONTOURS-TRIODES 

ALLOY  A- 32  MANUFACTURER  1 
5000  LIFE  HOURS 


(17.8) 


(40.6) 


(22.4) 


9,0 


(68.3) 


/vf- - 

— 

/ 

'  /  < 

\  \  ‘ 

\\\ 

-■»  '•x 

...  Xn 

K  \  c, 

209fl^\  'n 

‘(2201  3  f 

_ 

(179) 
i  (205) 


6.9 


(131) 

6.3 

FILAMENT  VOLTAGE 


5.7 


or  log  » 


0-447 

y/ZA 


0.447 

4.9 


^C)/A/q 


*  0  0913  (23%  READING  SHIFT) 


APPENDIX 


USE  OF  DICKSON  SINGIE-TABLE  METHOD  FOR  THREE-FACTOR  ANALYSIS 


Illustration  from  Sample  Chart 


2^^  Enter  data  for 
individual  read¬ 
ings  (1) 


Enter  126  individual  readings  from  7  tubes  X  2  Sections 
X  3  voltage  levels  X  3  current  levels. 


Calculate  svun 
and  svun  of 
squares  for  in¬ 
dividual  read¬ 
ings  in  first 
box  of  chart 
covering  re¬ 
plications  of 
experiment  at 
single  level 
of  each  of  three 
factors.  Enter 
sum  of  squares 
in  row  labeled 

(2a)  and 
average  of  sum 
in  row  labeled 
Ave  (2b) .  Re¬ 
peat  for  each 
3-wa7  level. 


Sum  of  squares  obtained  for  individual  readings  for 
Section  (l)  at  E^  =  6.9  volts  and  I,  =  9.0  mA.  The 
sum  of  these  same  readings  is  averaged 

—  (l.l76)^+(1.230)^+(l.U6)^+(l.672)^+(0.954)^+ 
(1.079)^+(1.204)^  =  10.529 

Ave  — 1 . 176+1 . 230-«-l .  1 A6-H .  672t-0. 95  A+l .  079-H .  20A  =  1.209 

7 

The,  value  10.529  represent^  a  partial  sum  of  squares 
of  individual  readings,  EXT. 


The  value  1.209  is  3^  ^  j.  =  ^^l^A 

Process  is  repeated  17  times  to  obtain  total  of  IS 
values  of  partial  sum  of  squares  and  18  values  of 


Calculate  svun 
and  sum  of 
squares  for 
average  values 
obtained  in 
row  labeled 
2 

E  under  col¬ 
umn  labeled 
Ave  (3a) .Enter 
average  of  sum 
in  row  labeled 
Ave  (3b) ,  Re¬ 
peat  for  each 
level  of  two 
remaining  fac¬ 
tors. 


Svun  of  squares  of  the  two  averages  over  Sections  (l) 
and  (2)  for  E-  =  6.9  volts  and  I,  =  9.0  mA  is  obtained. 
The  average  or  these  two  averages  is  also  calculated. 

7?  —  (1.209)^+(1.075)^  =  2,617 

Ave  ~  1.209-«-1.075  =  1.1A2 

2 

The_value  2.617  represents  a  partial  sum  of  squares 
of  Xgjg  values  or  a  partial  sum  of  (Xgjg) 


The  value  1,142  is  X. 


-  nc 


^A 


Process  is  repeated  6  times  to  obtain  total  of  9  values 
of  partial  sums  of  square  and  9  values  of  X,  „ 


PRECEDING  PAGE  BLANK 


(All  entries  in  the  body  of  table  have  been  made. 

The  next  step  represents  the  first  of  the  peripheral 
calculations.) 


Calculate  sum 
and  sum  of 
squares  for 
average  values 
single  level 
of  first  factor 
and  second 
factor  across 
levels  of 
third  factor. 
Biter  sum  of 
squares  in 
row  labeled 
and  coliamn 
(1)  under 
on  righthand 
periphery  of 
chart  (4a). 

Enter  average 
of  sum  of 
averages  in  row 
labeled  Ave  and 
column  (l)  xmder 
E2  on  righthand 
periphery  of 
chart  ( 4b) .  Re¬ 
peat  for  each 
level  of  second 
factor. 


Sum  of  squares  of  the  averages  for  Section  (l)  at 
=  9.0  mA  for  three  levels  of  is  obtained. 

The  average  of  these  three  averages  is  also  calcu¬ 
lated 

E^  —  (1.209)^  ♦(1.206)^+(1.020)  =  3.957 

Ave  —  1.209-»-1.206n.020  =  1.145 
3 

The__value  3.957  represents  a  partial  sum  of  squares 
of  values  or  a  partial  sum  of  (Xg^g)^ 

7 

These  data  are  a  duplication  of  the  partial  stuns  of 
squares  calculated  in  Step  2  and  can  be  used  as  a 
computation  check  of  the  total  sum  of  • 

The  value  1.145  is  the  average  of  a  partial  sum  of 


Process  is  repeated  2  times  to  obtain  a  total  of 
3  values  of  partial  sum  of  squares  and  3  averages 
of  partial  sums  of  X-  t 


Repeat  Step  4 
for  all  levels 
of  first  factor 
entering  data 
in  appropriate 
column  of  first 
factor  in  right- 
hand  peripheral 
area,  (5a  and  5b) 


This  step  obtains  additional  partial  sums  of  squares 
of  XeTTT  values  and  the  remaining  average  of  the 

OXl!i 

partial  smn  of  X-  ^  values 

^rk 


63 


Repeat  Step.  4 
for  single  level 
of  first  fac- 
^;or  and  third 
factor  across 
levels  of  sec¬ 
ond  factor. 

Enter  sum  of 
squares  in  row 
in  bottom  peri¬ 
pheral  area 
labeled  E‘^(6a) . 
Enter  average 
of  sura  of  aver¬ 
ages  in  row 
labeled  Ave  in 
bottom  peri¬ 
pheral  area 
(6b) . 


Sum  of  squares  of  the  averages  for  Section  (l)  at 
^  =  6.9  voits  for  thi*ee  levels  of  is  obtained. 

The  average  of  these  three  averages  is  also  calculated. 

1?  —  (1.209)^+(1.291)^+(1.1S4)^  =  4.530 

Ave  —  1.209+1.291-H.184  =  1.228 
3 

The_value  4.530  represents  a  partial  sum  of  squares 
of  Xgg  values  or  a  partial  sum  of  (Xgjg)^ 


The  value  of  1,228  is  the  average  of  a  partial  sum  of 

Process  is  repeated  2  times  to  obtain  a  total  of  3  values 
of  partial  suras  of  squares  and  3  averages  of  partial  sums 


of  X, 


Repeat  Step  6 
for  all  levels 
of  first  factor, 
entering  data 
in  appro¬ 
priate  column 
of  first  fac¬ 
tor  in  bottom 
peripheral 
area.  (7a  and 
7b) 


This  ste^  obtains  additional  values  of  partial  sums  of 
squares  values  and  the  remaining  average  of  the 

partial  sum  of  X-,  „  values 

O  •J&T 

0  I* 


Calculate  sum 
and  sum  of 
squares  of 
averages  found 
in  Step  3  across 
first  level  of 
the  second  fac¬ 
tor.  Enter  sum 
of  squares  in 
row  labeled 
under  col¬ 
umn  labeled 
Ave  in  right- 
hand  periphe¬ 
ral  area  (8a) . 


Sum  of  squares  for  the  Xj^,  values  across  =  9,0  mA 

is  obtained.  The  average  of  these  same  values  is  also 
calculated. 

—  (1.142)^+(1.192)^+(1.190)^  =  4.ia 

Ave  —  1.142+1.192-H.190  =  1.175 
3 

Note  that  (4b)  and  (5b)  values  can  be  used  to  obtain 
the  same  average 

1.145»1.204  =  1.175 
2 

The  value  4.141  represents  a  partial  sum  of  squares  of 
Xjj.  values  or  a  partial  sum  of  (^jg)^ 

(7X2)^ 


Enter  aver¬ 
age  of  the 
sum  of  aver¬ 
ages  in  the 
row  labeled 
Ave  under 
column  labeled 
Ave  in  the 
righthand  peri¬ 
pheral  area 
(8b).  Repeat 
across  each 
level  of  the 
second  factor. 


The  value  of  1*175  is  the  average  of  the  total  sum  of 


Process  is  repeated  2  times  to  obtain  a  total  of  3 
values  of  partial  sums  of  squares  and  3  averages  of 
the  total  sum  of 


Calculate  sum 
and  sum  of 
squares  of 
averages  found 
in  Step  3 
across  first 
level  of 
third  fac¬ 
tor.  Enter 
sum  of  squares 
in  row  labeled 
22  in  bottom 
peripheral 
area  (9a) .  En¬ 
ter  average  of 
the  sum  of  aver¬ 
ages  in  the  row 
labeled  Ave  in 
bottom  periphe¬ 
ral  area  (9b). 
Repeat  across 
each  level  of 
the  third  fac¬ 
tor. 


Sum  of  squares  for  the  values  across  E^  =  6.9  volts 

is  obtained.  The  average  of  these  same  values  is  also 
calculated 

2^  —  (1.142)^+(1.302)^+(1.204)^  =  4.366 


Ave  —  1*  142-^1. 302-H.: 
3 


1.204 


Note  that  (6b)  and  (7b)  values  can  be  used  to  obtain  the 
same  average 

1*228^1.180  =  1.204 
2 

The  value  4.366  represents  a  partial  sum  of  squares  of 
Xjj,  values  or  a  partial  sum  of  (Xjg)^ 

The  value  of  1,204  is  the  average  of  the  total  sum  of 

X 

The  process  is  repeated  2  times  to  obtain  a  total  of 
3  partial  sums  of  squares  and  3  averages  of  the  total 
sum  of  Xg 


The  partial  sums  of  squares  are  essentially  a  duplication 
of  data  obtained  in  Step  8  and  are  used  as  a  computation 

check  of  the  total  sum  of  (Xjg) 


Calculate  the 
sum  of  squares 
of  the  averages 
found  in  Steps 
4  and  5  for  the 
first  level  of 
the  second  fac¬ 
tor.  Enter  in 
box  labeled 
(lO)  on  Step 
^ocedure 
chart.  Re¬ 
peat  for  each  | 
level  of  the 
second  factor. 


Siun  of  squares  for  the  X--  values  for  L  =  9.0  mA 
is  obtained 

« 

(1.145)  ^+(,1.204)^  =  2.761 

The__value  of  2.761  represents  a  partial  sum  of  squares 
of  Xgj  values  or  a  partial  sum  of 

(7X3)^ 

The  process  is  repeated  2  times  to  obtain  a  total  of 
3  partial  sums  of  squares 


11,  Calculate  the 
sum  of  squares 
of  the  aver¬ 
ages  found  in 
Steps  6  and  7 
for  the  first 
level  of  the 
third  factor. 
Enter  in  box 
labeled  (U) 
on  Step  Pro¬ 
cedure  chaiHi, 
Repeat  for 
each  level  of 
the  third  fac¬ 
tor. 


Svim  of  squares  for  the  X«„  values  for  E«  =  6.9  volts  is 
obtained 

(1.228)^+(1.180)^  =  2.900 

The  value  2.900  represents  a  partial  sum  of  squares  of 


The  value  ii.vuu  represents  a  pa 
Xc,„  or  a  partial  sum  of  (X„p)^ 


(7X3) 

The  process  is  repeated  2  times  to  obtain  a  total  of 
3  partial  sums  of  squares 


12.  Sum  the  values 
of  partial  sums 
of  squares 
foiuid  in  Step 
2  (6  values) 
across  the 
first  level  of 
the  second  fac¬ 
tor.  Enter  in 
box  labeled 
(12)  on  Step 
Procedure 
chart.  Re¬ 
peat  for  each 
level  of  the 
second  factor. 


Sum  of  partial  sum  of  X  values  is  obtained  for  I  = 
9.0  mA  ° 

10. 529+8. 303 +10. 252+9. 812+7. 067+13. 087  =  59.590 

The  value  of  59.590  is  a  further  summing  of  the  partial 
sum  of  values. 

The  process  is  repeated  2  times  to  obtain  a  total  of 
3  partial  sums  of  squares. 


Sum  the  values 
of  partial  sums 
of  squares  found 
in  Step  2  (6 
values)  across 
the  first  level 
of  the  third  fac¬ 
tor.  Enter  in 
box  labeled 
(13)  on  Step 
^ocedure 
chart.  Repeat 
process  for 
each  level  of 
the  third  fac¬ 
tor. 


Sum  of  the  partial  sum  of  X  values  is  obtained  for 
=  6.9  volts 

IO.529+6.303+II.6S9+I2.II5+9.637+9.466  =  61.939 

The  value  61.939  is  a  further  summing  of  the  partial 
sum  of  values. 

The  process  is  repeated  for  2  times  to  obtain  a  total 
of  3  partial  sums  of  squares.  The  partial  sums  of 
squares  are  a  duplication  of  data  obtained  in  Step  12 
and  are  used  as  a  coiqputation  check  of  the  sum  of 


Sum  the  values 
obtained  in 
Step  12  or 
Step  13.  En¬ 
ter  in  box 
labeled  (I4) 
on  Step  ^o- 
cedure  chart. 


The  total  sum  of  X^  values  is  obtained 


59.590+77.091+71.655  =  206.336 


61.939+69.966+76.431  »  208.336 

r 

The  value  2^.336  is  the  total  sum  of  X'* 


Calculate  the 
sum  and  sum 
of  squares  of 
the  averages 
found  in  Step 
4.  Enter  the 
sum  of  squares 
in  the  box  la¬ 
beled  (15a). 
Enter  the  aver¬ 
age  of  the  sum 
of  averages 
in  the  box 
labeled  (I5b) , 
Repeat  the  pro¬ 
cess  for  the 
averages  found 
in  Step  5. 


The  sum  of  squares  of  the  Xg-  values  is  obtained. 

The  average  of  these  same  values  is  also  calculated, 

(1.145)^+(1.312) ^+(1.219)^  =  4. 518 

1.145+1^312+1.219  ^  ^^225 

The  value  4,518  represents  a  partial  sum  of  squares  of 
X_-  values  or  a  partial  sum  of  (X„_)2 

mf- 

The  value  1,225  represents  the  average  of  the  total 
sum  of  Xgj 

The  process  is  repeated  to  find  a  total  of  2  values 
of  partial  sum  of  squares  and  2  averages  of  the 
total  sum  of  X-  . 


67 


14.  ReP®^^  ^5 
^  *  for  the  averages 
found  in  Step  6. 
Enter  the  sum 
of  squares  in 
the  l^x  labeled 
(l6a).  Enter 
the  average  of 
the  sura  of  aver¬ 
ages  in  the  box 
labeled  (l6b). 
Repeat  the  pro¬ 
cess  for  the 
averages  found 
in  Step  7. 


The  sum  of  squares  of  the  values  is  obtained. 
The  average  of  these  same  values  is  r.lso  calculated 

(1.228)  ^+(1.235)^+(1.212)'^  =  4.502 

■  1.228-H.235-»-1.212  ^  ^  225 

^  P 

The  value  /4..502  represents  a  partial  sum  of  (X-.„) 
values  or  a  partial^  sum  of  (X-„) 

OUl 

The  value  1,225  represents  the  average  of  the  total 
sum  of  Xgj.  As  a  check  on  the  computation  process, 
it  should  be  equal  to  the  average  for  the  sum  of 
Xg,^  found  in  Step  15, 


The  process  is  repeated  to  find  a  total  of  2  values 
of  the  partial  sum  of  squares  and  2  averages  of  the 
total  sum  of  X^. 


L7,  Calculate  the 
sum  and  sum  of 
squares  of  the 
average  values 
found  in  Step  8. 
Enter  the  sum 
of  squares  in 
the  box  labeled 
(l7a).  Enter 
the  average 
of  the  sum 
of  averages  in 
the  box  labeled 
(17b). 


The  sum  of  squares  of  the  Xj  values  is  obtained.  The 
average  of  these  same  values  is  also  calculated 

(1.175)^+(1.336)^+(1.272)^  =  4.7^4 

1.175*1.336t-1.272  ^  3^^261 

The  value  4»784  represents  the  total  sum  of  squares 
of  the  Xj  values  or  the  total  sum  of  (Xj)^ 

(7x3x2) 2 

The  value  1,261  represents  the  grand  average. 


+ 


18.  Repeat  Step  17 
for  the  average 
values  found  in 
Step  9.  Enter 
the  sum  of 
squares  in  the 
box  labeled 
(18a),  Enter  the 
average  in  the 
box  labeled 
(18b).  . 


The  sum  of  squares  of  the  Xg  values  is  obtained.  The 
average  of  these  same  values  is  also  calculated 

(1.204)^+  (1.270)^+(1.308)^  =  4.773 

1.204-H.27CH1.308  ^  ^  261 

The  value  4.773  represents  the  total  s)m  of  squares  of 
the  Xg  values  or  the  total  sum  of  (Xg)^ 

.  ('7X3X2)^ 


The  value  1,261  represents  the  grand  average  and  should 
check  the  value  found  in  Step  17, 


Calculate  the 
sum  and  sum 
of  squares  of 
the  average 
values  found 
in  Steps  15 
or  16,  Enter 
the  sum  of 
squares  in 
the  boxes 
labeled  19, 

The  average 
of  the  sum 
of  averages 
is  not  entered 
but  should 
check  the  value 
entered  in  box¬ 
es  17b  and  18b, 


The  sum  of  squares  of  the  Xg  values  is  obtained.  The 
average  of  these  same  values  is  also  calculated 

(1,225)^*(1.296)^  ■=  3.180 
.  1.261 

The  value  3*180  represents  the  total  sum  of  squares 
of  the  Xg  or  the  total  sum  of  (Xg)^ 

(7x3x3)^ 

The  value  1,261  represents  the  grand  average  and  should 
check  the  value  found  in  Steps  17  and  18, 


Svim  the  values 
of  the  sum  of 
squares  found 
in  St.eps  4  and 
5,  Enter  in  the 
box  labeled  (20) 
on  the  Step  Pro¬ 
cedure  chart. 


The  sum  of  squares  of  the  Xg^^.  values  is  obtained, 

3.957+4.393+5,162+5.550+4.461+5,305  «  28,829 

The_value  28,829  represents  the  total  sura  of  squares 
of  Xgjg  values  of  the  sum  of  (Xgjg) 


Sum  the  values 
of  the  sum  of 
squares  found 
in  Steps  6  and 
7.  Siter  in  the 
box  labeled 
(21)  on  the 
Step  Procedure 
chart. 


The  sum  of  squares  of  the  Xg-j,  values  is  obtained  as  a 
conqjutation  check 

4.530+4.209+4.586+5.131+4.464+5.909  =  28.829 

This  value  should  check  the  value  found  in  Step  20. 

The  sum  of  the  9  values  of  sum  of  squares  foimd  in 
Step  3  should  also  check  this  value. 


69 


-2.  Sum  the  values 
^  *  of  the  sum  of 
squares  found 
in  Step  8.  En¬ 
ter  in  the  box 

labeled  (22) 
on  the  Step 
Procedure  Chart, 

The  sum  of  squares  of  the  values  is  obtained, 

4.141+5.357+4.868  =  14.366 

The  value  14.366  represents  the  total  sum  of  squares 
of  the  Xjg  values  or  the  sum  of  (Xjg)^ 

jmf 

23,  Sum  the  values 
of  the  sum  of 
squares  found 
in  Step  9.  En¬ 
ter  in  the  box 
labeled  (23)  on 
the  Step  IVo- 
cedure  chart. 

The  sum  of  squares  of  the  Xjg  values  is  obtained. 

4.366+4.849+5.151  =  14.366 

This  value  should  check  the  value  found  in  Step  22 

24.  Sum  the  values 
of  the  sum  of 
squares  foiind 
in  Step  10,  En¬ 
ter  in  the  box 
labeled  (24)  on 
the  Step  Pro¬ 
cedure  chart. 

The  sum  of  squares  of  the  Xgj  values  is  obtained 

2,761+3.571+3.239  =  9.570 

The  value  9.570  represents  the  total  sum  of  squares 
of  the  Xgj  values  or  the  sum  of  (Xgj)^ 

(7x3x37^ 

This  value  should  check  the  sum  of  the  2  values  obtained 
for  sum  of  squares  in  Step  I5, 

25.  Sum  the  values 
of  the  sum  of 
squares  found 
in  Step  11,  En¬ 
ter  in  the  box 
labeled  (25) 
on  the  Step 
Procedure  chart. 

The  sum  of  squares  of  the  Xg^  values  is  obtained 

2.900+3.228+3.437  =  9.566 

The  value  9.566  represents  the  total  sum  of  squares 
of  the  Xgg  values  or  (Xgg)2 

( 

This  value  should  check  the  sum  of  the  2  values  obtained 
for  sum  of  squares  in  Step  16, 

70 


SUMMATION  OF  DATA 


The  calculated  values  are  now  entered  in  a  Summary  Table  which  is  normally 
present  at  the  bottom  of  the  Three-Factor  Analysis  Table.  Only  one  operation 
on  the  calculated  values  is  required  in  the  transfer  of  data  to  the  Summary  Table 
i.e, ,  find  the  square  of  the  grand  average  in  box  17b  or  iSb  for  entry  in  the 
labeled  Correction  Factor  in  the  Summary  Table,  The  steps  involved  in  the  com¬ 
pletion  of  the  Summary  Table  are  enumerated  below; 

Step  1.  Enter  the  appropriate  values  of  sums  of  squares  from  the  Three-Factor 
Analysis  Table  in  the  first  column. 

Step  2.  Enter  the  n\imber  of  sections  (or  number  of  readings)  involved  in  the 

individual. terms  of  the  summation  of  sums  of  squares.  Thus,  the  resid¬ 
ual  sum  of  squares  consists  of  the  sum  of  squares  of  individual  readings 
and,  therefore,  the  number  of  sections  involved  is  one.  For  the  three- 
factor  interaction  term,  the  square  of  the  siun  over  7  sections  is 
involved,  so  this  number  is  entered  in  the  second  row.  The  two-factor 
interaction  terms  involve  the  square  of  the  sum  over  the  number  of 
replications  times  the  number  of  levels  of  the  third  level.  i,e.,  for 
SI  interaction  terms  7  X  3  (7  replications  X  3  levels  of  E;.  Likewise, 
the  main  effect  terms  calculated  from  the  number  of  replications  and 
the  number  of  levels  of  the  two  remaining  factors,  i.e.,  for  the  S 
effect  7x3x3  represents  the  number  of  sections  involved  (7  replica¬ 
tions  X  3  levels  of  E  X  3  levels  of  I).  Finally,  the  correction  factor 
involves  the  square  of  the  sum  over  the  total  niunber  of  readings  or 
7  X  3  X  3  X  2. 


Step  3.  Multiply  the  value  of  the  siuii  of  squares  listed  in  Column  1  by  the 

number  of  sections  listed  in  Column  2,  This  step  is  required  to  adjust 
for  the  method  of  calculation  in  terms  of  averages  rather  than  the  con¬ 
ventional  method  of  direct  stunmation.  Thus,  the  sum  of  squares  for  the 
three-factor  interaction  term  was  found  from 


The  term  actually  desired  for  the  analysis  of  variance  is 


n 


where  n  =  the  number  of  readings  involved  in  each  summation 

Since  Xq  ,  v  =  VA 
VA  n 

n  n^^ 

or  =  nS(Xgjj,)2 


n 


71 


Step  U 


The  final  values  of  the  sum  of  squsares  are  found  from  the  follovrLng 
equation 

Final  Sum  of  Squares  for  S  *  Adjusted  Sum  of  Squares  for  S  -  Correction 
Factor  or 

E^Sp  =  E^S^  -  C.F. 

E^Ip  =  E^I^  -  C.F. 

E^  =  E^^  -  C.F. 

E^IEp  «  E^IE^  -  C.F.  -  E^  -  E^Ip 

E^SEp  =  E^SE^  -  C.F.  -  E^Sp  -  E^ 

E^SIp  =  E^SI^  -  C.F.  -  E^Sp  -  E^Ip 

E^SIEp  »  E^SIEj^  -  C.F.  -  E^SIp  -  E^SEp  -  E^IEp 

2  2 
-  2  Ep  - 

2 

ReSp  **  Res^  -  E  SIEp 

The  final  values  of  sum  of  squares  are  equivalent  to  the  values  nor¬ 
mally  tabulated  in  an  Analysis  of  Variance  Table  and  the  remainder  of 
the  table  is  conventional. 


74 


STEP  PROCEDURE  FOR.FILUNC  IK 
DICKSON  THREE-FACTOR  ANALYSIS  TABLE 


7^ 

THE  APPLICATION  OF  EXPERIMENTAL  DESIGN 

TO  A 

RADAR  TARGET  ACQUISITION  SYSTEM 

Dr.  Erwin  Biser,  Harvey  Eisenberg  &  George  Millman 

Systems  Division 

Surveillance  Department,  Evans  Area 
U,  S.  krmy  Signal  R  &  D  Laboratory 
Fort  Monmouth,  N,  J, 

TABLE  OF  CONTENTS 

abstract 

introduction 

test  plan 

test  procedure 

application  of  a  2^  DESIGN 

nawre  of  the  acquisition  time  function 

effects  of  variables  on  ACQUISITION  HME 

RESULTS  AND  CONCLUSIONS 

RECOMMENDATIONS 

ACKNOWLEDGMENTS 

SHORT  GLOSSARY  OF  TERMS 

ABSTRACT.  2**  factorial  designs  were  applied  to  the  problem  of  minimizing 
target  acquisition  time  for  standard  and  modified  radar  tracking  systems  in  a 
series  of  tests  at  USASRDL. 

With  the  aid  of  2^  and  2^  factorial  designs,  the  modification  was  shown 
to  reduce  target  acquisition  time  and  target  transfer  failure  rates  signifi¬ 
cantly.  A  determination  was  made  of  the  dependence  of  acquisition  time  upon 
target  velocity,  course  type,  radar-crew  combination,  target  range  at  destination, 
and  time  lapses. 

INTRODUCTION. 

The  Problem.  The  radar  tracking  system  under  analysis  required  the 
transfer  of  target  position  information  from  an  acquisition  radar  to  a  tracking 
radar.  The  latter  radar  was  slewed  from  a  random  point. 

(An  acquisition  radar  is  one  that  periodically  scans 
a  predetermined  volume  of  space,  searching  for  enemy  targets. 

A  track  radar  is  one  that  closely  follows  a  target,  obtaining 
present  position  information  and  velocity  for  tracking  and 
niissi;|^  firing  purposes.) 


76 


Design  of  Experiments 


b.  The  Objective >  The  objective  of  the  analysis  is  to  determine  the 
conditions  under  which  target  acquisition  time  is  a  minimum.  This  is  the 
"yield  of  the  process".  The  acquisition  time  function  is  presumed  to  be  a 
ftinction  of  many  variables,  such  as: 

1)  The  absence  or  presence  of  the  modification,  the  Height 
Comparator,  which  presents  the  third  target  coordinate 
(height)  to  the  target  track  radar  operators  during  the 
acquisition  process,  (Note  that  this  modification  is 
not  possible  when  the  radar  is  operating  by  itself,  and 
not  as  part  of  a  defense  system) . 

2)  Proficiency  of  the  four-man  crew,  (Two  crews  were  used,) 

3)  Target  range  at  designation,  or,  target  slew-range, 

4)  Target  velocity,  (Slow,  medium  and  fast  aircraft  were  used,) 

5)  Altitude  maneuver  of  the  target, 

6)  Type  of  target  course,  (Radial  or  tangential  courses.) 

7)  The  effect  of  time  lapses  between  sets  of  data. 

8)  Target  transfer  failure  rate, 

9)  Operator  overshoot, 

10)  Human  Engineering  aspects  of  target  acquisition, 

11)  Initial  designation  failures  ("warmup  period"), 

TEST  PLAN.  Close  control  of  the  flight  pattern  was  acccxnplished  by 
means  of  the  reference  radar  plotting  board  and  UHP  radio. 

In  order  to  avoid  future  pitfalls  in  future  test  planning,  it  is  perti¬ 
nent  to  add  the  following  remarks.  The  final  test  plan  and  experimental 
design  were  quite  different  from  the  one  originally  conceived.  Originally, 
it  had  been  decided  to  acquire  the  target  at  definite  points  in  space  and 
in  particular  at: 

Ranges:  8,  18,  28,  38  thousands  of  yards 

Azimuths:  4600,  56,000,  64OO  mils  (W,  NW,  N) 

Altitudes:  Varying  randomly  among  three  altitudes,  such  as 
6,  8,  10  thousands  of  feet. 

The  original  concepts  were  revised  when  it  became  apparent  that  insufficient 
data  would  be  produced.  When  the  aircraft  reached  the  point  in  space,  the 
three  radars ^yere  not  always  ready.  When  the  radars  were  ready,  the  aircraft 
had  often  drifted  off  coiirse.  To  increase  efficiency  it  was  decided  to  ac¬ 
quire  targets  in  a  random  fashion,  spotchecking  afterwards  to  insuT’A 
distribution  in  range  and  altitnoo 


77 


VARIABLES  USED  IN  TEST  PUN 


1.  MODIFICATION 

Level  1  :  Modification  in  use  during  target  acquisition. 
Level  2  :  Modification  not  in  use  during  target  acquisition, 
(standard  mode  of  operation.) 

2.  RADAR  -  QRm  COMBINATION  ' 

Level  1:  Radar  #1  with  its  "permanent”  crew. 

Level  2:  Radar  #2  with  its  "permanent"  crew. 

3.  RANGE  OF  TARGET  AT  DESIGNATION 

Level  1  :  Short  range,  e.g.  ,  less  than  20,000  yds. 

Level  2  :  Long  range,  e.g,  ,  greater  than  20,000  yds. 


Fig  1 


ye 


VARIABLES  USED  IN  TEST  PLAN  (CONT.) 


4.  TARGET  CODRSE 

Level  1  ;  Radial  course  (e.g.,  azimuth  angle  constant). 
Level  2  ;  Tangential  course  (e.g,  ,  azimuth  angle  changing 
rapidly) , 

5.  TIME  LAPSE 

Level  1  ;  October  aeries  of  tests. 

Level  2  :  April  series  of  tests, 

6.  AIRCRAFT  (  SIZE  -  VELOCITY  -  ALTITUDE  COMBINATION) 

Level  1  ;  Plane  #1  ,  a  slow  propellor-driven  plane  used 
as  a  statistical  control. 

Level  2  :  Plane  #2, 

Level  3  Plane  #3. 


Fig  2 


w 


CM 


o> 

T 


Ul 

S 


Fig  3  COKFIQURATION 


80 


Design  of  EsEperiments 


An  additional  revision  was  required  when  fast  aircraft  were  flown.  The 
fast  aircraft  consumed  large  amounts  of  fuel,  reducing  the  data  recording 
session,  if  flown  at  lower  altitudes.  In  general,  faster  planes  fly  higher, 
and  it  would  have  been  very  expensive  to  separate  the  effects  of  velocity 
and  altitude.  This  rapidly  became  evident  after  an  initial  try  and  the 
test  plan  was  revised  accordingly. 

The  final  test  plan  considered  the  variables  shown  in  Figs.  1&2. 

TEST  PROCEDURE.  Three  aircraft  were  flown  against  three  tracking  radars 
as  shown  in  Fig.  3. 

One  radar  was  used  as  a  target  reference  source  and  flight  control  center. 
The  next  radar,  called  radar  #1  in  the  report,  was  used  In  the  modified  mode 
of  acquisition  when  the  third  radar,  called  radar  #2  in  the  report.  Was  used 
in  the  standard  mode.  After  approximately  ten  acquisitions,  radars  #1  and 
#2  changed  their  modes'  of  acquisition. 

The  essential  difference  among  the  targets  was  that  of  velocity.  Target  ifl 
(plane  #l)  was  the  slowest;  target  #3  (plane  was  the  fastest.  Average 
acquisition  time  and  transfer  failure  rates  were  expected  to  increase  with 
velocity,  and  this  was  verified  by  the  analysis.  The  variable  of  target  alti¬ 
tude  was  confoxinded  with  veloctiy,  since  faster  pleuies  tend  to  fly  higher. 

The  two  models  of  the  Height  Comparator  (the  modification)  differed 
slightly.  The  model  in  radar  jJQ.  needed  an  operator  to  slew  the  antenna  ele¬ 
vation,  while  the  model  in  radar  #2  was  completely  automatic.  This  effect 
was  considered  minor  and  is  confounded  with  the  radar-crew  variable. 

Plane  #L,  (L-I9),  the  slowest  plane,  was  varied  continuously  in  altitude 
to  prevent  the  elevation  operators  from  anticipating  the  target  elevation 
angle.  It  was  found  that  this  aspect  of  the  test  plan  was  rarely  considered 
in  the  field.  This  can  be  easily  accomplished  with  slow  aircraft.  Planes 
#2  and  were  flown  at  constant  altitude  and  were  varied  only  slightly  in 
altitude  during  the  test  since  their  speed  and  position  change  made  each 
designation  appear  as  a  new  target.  In  general,  interdependence  between  any 
two  successive  acquisitions  was  reduced  by  varying  the  elevation  angle 
randomly  as  much  as  +  250  mils. 

The  "Count  down  to  acquire"  conmand  was  given  only  if  both  designation 
radars  obtained  good  video  on  the  PPI  display.  Prior  to  designation  each 
track  radar  was  off  target  for  a  minimum  of  one  minute,  standing  by  at  a 
pre-determined  range,  zero  azimuth,  and  zero  elevation.  Each  designation 
was  performed  simultaneously  by  both  radars.  The  designation  time  and  tar¬ 
get  position  were  recorded.  The  time  clock  was  activated  when  the  desig¬ 
nation  operator  pressed  his  designate  button  and  was  deactivated  when  the 
track  operators  threw  the  automatic  track  switch.  Slew  time  was  also  re¬ 
corded  but  was  not  used  in  the  analysis  except  as  a  check.  (Average  slew 
time  was  4  seconds.) 

APPUCATION  OF  A  2^  DESIGN.  To  elucidate  the  application  of  experimental 
designs,  let  us  examine  Data  Set  Hb.  Three  factors,  each  with  two  levels, 
were  studied.  The  factors,  upper  levels,  and  lower  levels  are  defined  in 
Fig.  4. 


81 


DATA  SET  6  :  A  2^  EXPERIMENT 

A  r  •  MODIFICATION  FACTOR 

=  Modification  in  use  during  target  acquisition, 
a^  *  Modification  not  in  use  during  target  acquisition, 
(standard  mode  of  operation,) 

B  ;  TARGET  RANGE  FACTOR 

=  Short  range.  (Range  less  than  20,000  yards.) 

^2  =  Long  r^e,  (Range  equal  to  or  greater  than  20,000  yards.) 

C  :  AIRCRAFT  -  VELOCITY  FACTOR 
®  Plane  # 1 
Cg  =  Plane  #2 


Fig  4 


AVERAGES  FOR  DATA  SET  .ife  (  20  REPUCATIONS  ) 


Treatment 

Average 

Average 

Acquisition 

Acquisition 

Symbol 

Time  (secs.) 

Time  (secs,; 

84 


DATA  SET  #6  :  A  2^  EXPERIMENT  (CONT>) 

"A"  Effect  »  -L 
4 

At  (Sht  Rgd,Rdr  1)  : 

a-(l)  =  10»13  -  7.76  =  2.36  seconds  improvement 
At  b^C2  (Sht  Rgd,Rdr  2)  ; 

ac  -  c  =  14.59  ~  9.03  “  5.56  seconds  improvement 
At  b^c^  (Lg  Rge,  Rdr  1)  ; 

ab  -  b  «  8.76  -  5,86  =2,90  seconds  improvement 
At  b^c^  (LgRge,  Rdr  2)  : 

abc  -  be  =  13.04  -  9.18  =  3,86  seconds  improvement 
Average  of  four  subeffects: 


a  - 


(1)  +  (ac  -  c)  +  (ab  -  b)  +  (abc 


-g 


-7^  (2,37  +  5.56  +  2,90  +  3.86  )  »  3.67  seconds  overall 
4 

improvement  attributable  to  the  modification. 


Fig  7 


85 


o| 


Note  the  parallelism  or  independence  between  a  and- 

Fig  3 


Effect  Hcasured  at 


87 


DATA  SET  #6  ;  A  2^  EXPERIMENT  (CONT.) 


AB  Interaction 


If  AB  =  0,  there  is  no  interaction  between  A  and  B  (Modification 
and  Range.)  Thus,  the  modification  would  have  the  same  effect  on 
acquisition  time  for  any  target  range. 

At  b2,  the  "A"  Effect  is; 

(ab  -  b)  +  (abc  -  be)  =  (2«90  ♦  3.86)  _  ^ 

At  b^,  the  »A”  Effect  is; 

(  a  -  (1)  )  +  (ac  -  c)=  (2,36  5.56)  »  3.96 

2 

The  AB  Interaction  is; 

1  (3.38  -  3.96)  =  -0,29 

2 


Fig  10 


Design  of  Experimentg 

The  experiment  \fas  replicated  twenty  times.  The  averages  are  illustrated 
Figs.  5  &  6. 

The  effect  of  the  modification,  the  ”A”  factor,  can  be  determined  by  meatig 
of  the  formula  shown  in  Fig  ?• 

The  above  slopes  in  Fig.  7  (2.36,  5.56,  2,90,  3.66)  can  be  illustrated 
considering  either  the  pair  of  projections  shown  in  Fig,  B  or  the  pair  of  pro¬ 
jections  shown  in  Fig.  9. 

The  interactions  AB,  AG,  BG  can  be  determined,  AB  (the  effect  on  the  modis 
fication  factor  of  changing  levels  of  the  range  factor)  is  given  in  Fig.  10. 

For  the  remaining  calculations,  see  Data  Set  #6, 

NATURE  OF  THE  AGQUISITION  TIME  FUNGTION;  This  series  of  tests  sheds  much  light 
on  the  overall  acquisition  procedure.  The  acquisition  time  function  can  be  dis- 
cussed  with  respect  to  two  points  of  view. 

a.  Mean  Acquisition  Time  ~  The  detailed  discussions  on  mean  acquisition 
time  that  follow  can  be  generalized  as  follows : 

1,  When  the  slow  radar  target,  plane #1,  was  flown  as  a  control  on 
a  straight  and  level  course,  acquisition  time  averages  were  in  the  region  of 
B  to  10  seconds.  The  Height  Gon^arator  reduced  acquisition  time  as  much  as  2 
seconds  depending  upon  the  crew,  training,  length  of  test,  and  target  course 
type.  When  the  L-19  is  not  flown  as  a  control,  the  average  acquisition  times 
can  fall  between  14  to  16  seconds. 

2,  Acquisition  time  averages  on  radar  target  #2,  plane #2,  were  in 
the  neighborhood  of  14  to  16  seconds  and  were  reduced  significantly  to  approxi¬ 
mately  9  seconds  by  the  Height  Comparator, 

3,  Acquisition  time  averages  on  radar  target #3,  plane #3,  were  in 
the  vicinity  of  16  to  19  seconds  and  were  reduced  significantly  to  about  14 

or  15  seconds.  The  plane  #3  data  is  biased  in  that  a  loss  of  skill  had  occuri>- 
ed  in  the  five  month  interval  between  the  plane #2  and  plane  §3  flights.  Thus, 
the  plane #3  averages  were  adjusted  downwards  when  comparisons  with  the  earlier 
data  are  made, 

4,  Target  slew-range,  time  lapses,  and  the  radar-crew  combination 
had  a  considerable  effect  upon  this  time  analysis, 

b.  Overshoot  -  The  acquisition  time  fimction  possesses  a  characteristic 
more  commonly  found  in  servo  systems.  If  the  radar  return  from  the  target  is 
weak,  or  if  the  target  is  moving  rapidly  with  respect  to  the  slew  time,  the 
radar  gates,  or  the  crew’s  coordination  capabilities,  it  is  easy  for  the  eleva¬ 
tion  operator  to  bypass,  or  overshoot ,  the  target  while  slewing  blindly  in 
elevation.  When  this  occurs,  several  seconds  are  needed  to  correct  this  error. 
From  a  mathematical  point  of  view  this  produces  a  bimodal  distribution,  that  is, 
the  acquisitions  are  grouped  about  two  means  instead  of  one.  If  overshoot  does 
not  occur,  the  acquisition  time  averages  fall  within  the  interval  of  6  to  12 
seconds.  If  overshoot  does  occur,  the  acquisition  averages  fall  within  14  to  1^ 
seconds.  The  overall  average  must  therefore  be  representative  of  both  averages, 


FREQUENCY  DISTRIBUTION  OF  ACQUISITION 
TIMES  ON  PLANE  #3 


TARGET  ACQUISITION  TIME  (SECONDS) 


FREQUENCY  DISTRIBUTION  OF 
ACQUISITION  TIME  ON  PLANER 


ACQUISITION  TIME  (SECONDS) 


TARGET  VELOCITY 


Design  of  Experiments 


95 


indicating  the  percentage  of  the  time  that  overshoot  occurs.  It  is  apparent 
from  the  data,  and  from  the  visual  observations  made  during  the  test  series, 
that  the  presence  of  the  Height  Comparator  almost  completely  eliminates  overshoot, 
fhe  data  on  plane  §  3  indicates  that  the  bimodal  nature  of  the  acquisition 
iime  function  disappears  at  approximately  32  seconds,  indicating  that  acquisi¬ 
tions  made  after  32  seconds  are  different  in  nature  and  suggesting  that  acqui¬ 
sitions  after  32  seconds,  not  Zj.C,  should  be  called  failures.  Note  that  the  over¬ 
shoot  nature  is  not  as  apparent  for  the  slower  aircraft,  but  the  data  indi¬ 
cates  that,  if  present,  it  occurs  in  less  than  20  or  25  seconds. 

v.FFECT  OF  VARIABLES  ON  ACQUISITION  TIME;  This  series  of  tests  was  designed 

as  a  2*^  factorial  experim.ent  and  the  data  was  reduced  accordingly.  Although 
a  large  number  of  variables  exists  in  a  test  of  this  type,  the  data  reduction 
indicated  definite  trends  and  consistencies, 

a.  Statistical  Control  -  In  order  to  maintain  consistent  control  of  the 
data  over  an  extended  test  period,  the  low  velocity  radar  target  (L-19  air¬ 
craft)  was  used  to  obtain  statistical  control  data  as  well  as  data  pertaining 
to  lov/  velocity  aircraft.  The  initial  tests  vrere  performed  with  the  low  velo¬ 
city  aircraft  (L-19  propeller-driven  aircraft) ,  the  second  phase  used  a  medium 
velocity,  and  the  third  phase  used  a  high  velocity  target.  In  phases  two  and 
three,  the  low  speed  aircraft  was  utilized  as  a  time  check  standard  against 
men  and  equipment, 

b.  Fast  Targets  -  Effects  of  Height  Comparator.  Target  Course  Type  and 
Time  Lapse  on  Acruisition  Time  -  From  data  Sets  6  and  7  it  is  seen  that  the  use 
of  the  Height  Comparator  resulted  in  reducing  target  acquisition  time,  for 
planes  ^  2  and  ,  by  A. 7  seconds  (See  Tables  2  and  3)  • 

The  data  indicate  (the  small  sample  size  resulted  in  large  errors)  that  the 
course  type,  ra-dial  vs,  tangential,  may  have  some  effect  on  acquisition  time. 

But  this  effect,  if  present,  is  dependent  upon  the  radar-crevr  combination. 

The  effect  of  flying  a  fast  plane  radially  and  tangentially  vdll  be  found  in 
Data  Set  10.  The  difference  in  average  acnuisition  time  is  1,5  seconds,  but 
this  fi,gure  cannot  be  considered  statistically  significant,  since  the  interval 
of  uncertainty  is  almost  -  2  seconds.  Further  details  are  given  in  table  Ac 

The  effect  of  the  radar-crevr  combination  upon  target  acquisition  time  was 
not  pronounced,  cind  in  fact  was  statistically  insignificant. 

Due  to  the  lack  of  data  on  radar-crevr  ^  2  for  target  or  plane  2  acquisi¬ 
tions,  only  the  data  of  radar-crew?^^  1  could  be  used  in  table  5  comparing  planes 
#2  and#  3. 

Referring  to  Table  5,  the  figures  3.26  and  2.35  seconds  (in  the  last 
column)  are  confounded  vdth  the  effect  of  a  six  months  time  lapse  between  the 
plane  2  and  plane  3  flights.  This  effect  has  been  shown  to  be  statistically 
significant.  From  the  data  (discussed  in  Data  Set  A)  it  is  seen  that  the 
ficouisition  times  on  the  L-19  increased  by  1,A  seconds  because  of  the  six  months 
time  lapse  'when  both  crews  were  studied’  1,07  seconds  of  this  increase  was 
attributed  to  radar-crew  At  this  point,  one  of  two  assumptions  can  be  made. 


PRECEDING  PAGE  BLANK 


[•If] 


l-l,  -I.  +  I  ) 
ox  TREATHEHT  C 
WHERE  C  ■ 


(+1  -I  +1) 

OR  TREATMENT  ac 
WHERE  ^  B  OjbjCjy 


(-1,  -I,  -I)  - 

OR  TREATMENT  (|) 
WHERE  (I)  ■  «ibjCj 


C 

1 


r(  +  l,  +1.  +1) 

OR  TREATMENl 
a_^C  WHERE 
abc  B  ®2^2^2 


^(-1.  +1,  +1) 

OR  TREATMENT  be 
WHERE  ^  B  a  bjC 


,(-l.  +1,  -I) 

OR  TREATMENT  b 
WHERE  b  B  OjbjCj 


(+1,  -I.  -I) 

OR  TREATMENT  a 
WHERE  a  =  a  bjCj 


(+1.  +1.  -I) 

OR  TREATMENT  ab 
WHERE  a_b  =  ajbjCj 


LEVELS  OF  THE  VARIABLES  FOR  A  2^  FACTORIAL  EXPERIMENT 


Fig  15 


preceding  page  blank 


Design  of  Experiments 


107 


If  the  degradation  applies  only  to  the  L-19  acquisitions,  and  not  to  the  fast 
plane,  then  the  second  figure  of  2.35  seconds  quoted  above  still  holds.  If 
the  degradation  does  apply  to  the  fast  aircraft,  then  the  difference  in  acqxii- 
sition  time  improvement  between  the  plane  ^2  and  plane  ^3  1.28  seconds 

(2,35  ndnus  1,07)  •  The  inprovement  figure  of  1,28  seconds  is  not^statistically 
significant  since  the  interval  of  tmcertainty  about  it  is  almost  ~  2  seconds. 
Thus,  it  cannot  be  determined  from  this  data  whether  or  not  the  improvement  is 
greatest  for  the  fastest  edrcraft.  See  Table  6  for  further  details. 

The  inprovement  in  acquisition  performance  with  respect  to  plane  ^3  is 
more  outstanding  in  the  reduction  of  the  number  of  overshoots  and  the  trans¬ 
fer  failure  rate. 

c.  Comparison  of  Slow  and  Fast  Radar  Targets  for  the  Height  Comparator; 
When  the  L-19  is  compared  to  the  fast  aircraft,  two  important  facts  emerge; 

1,  Between  the  first  and  second  groups  of  flights,  a  time  lapse 
effect  of  1,4  seconds  was  present. 

2.  There  was  a  gap  in  data  of  plane  #2,  Acquisition  times  for  radar- 
crev/#2  with  the  Height  Comparator  was  essentially  missing  due  to  radar  malfunc¬ 
tion. 


Thus,  only  the  performance  of  radar-crew #1  was  considered.  The  tables 
on  the  follov/ing  page  from  Data  Sets  6  and  7  are  relevant, 

d.  The  Height  Comparator  and  Slow  Targets;  The  effect  of  the  Height 
Comparator  is  closely  related  to  the  skill  and  proficiency  of  the  radar-crew 
combination  when  consecutive  acquisitions  are  made  on  a  slow  aircraft  flying 
a  constant  altitude  course.  This  difference  (in  acquisition  time)  could  be 
attributed  to  the  difference  in  radars,  to  the  difference  between  the  two 
models  of  the  Height  Comparator,  or  to  the  difference  in  operator  skill  of 
the  two  crews.  One  crew  made  better  use  of  the  Height  Comparator  while  the 
other  did  not  need  it.  As  an  illustration  consider  Table  7  taken  from  Data 
Set  1, 


The  average  improvement  in  acquisition  time  averaged  over  both  crews, 
is  1,72  seconds  (one-half  s\im  of  2,64  and  0,80), 

From  Data  Set  2,  it  is  seen  that  the  crew-comparator  interaction  was 
also  present  in  the  April  1957  tests.  The  effect  of  the  Height  Comparator 
in  the  May  1957  tests  was  not  statistically  significant,  but  the  average 
acquisition  time  of  the  crews  varied  by  1.35  seconds. 

Though  not  statistically  significant,  there  is  an  indication  of  inter¬ 
action  between  target  range  and  the  Height  Comparator.  If  this  interaction 
is  not  a  random  fluctuation,  then  the  results  imply  that  the  Height  Comparator 
is  more  effective  in  reducing  acquisition  time  for  targets  at  short  range. 

This  interaction  is  to  be  expected,  because  acquisitio]^g|^p|l^(^^9^EtSBgWK 
have  greater  range  slew  times. 


Design  of  Experiments 


log 


e.  Range  and  Slewing  Effect;  For  all  targets  the  designation  range 
of  the  target  consistently  had  a  significant  effect  on  acquisition  time. 

This  effect  was  expected  because  of  the  test  procedure.  The  radar  was  always 
slewed  inward  from  IfOfOOO  yards  range,  the  maximum  computer  range.  However, 
this  effect  occurred  with  unexpected  consistency.  The  difference  in  average 
acquisition  time  between  short  and  long  ranges  varied  from  1  to  3  seconds. 

For  the  L-19,  the  average  short  range  was  12  to  15  thousand  yards,  while  the 
average  long  range  was  25  to  26  thousand  yards.  For  the  fast  aircraft  the 
range  figures  are  somewhat  larger.  It  is  noteworthy  to  point  out  that  slew 
time  ^vas  also  recorded.  However,  search  time  (acquisition  time  minus  slew 
time)  was  not  analyzed.  It  was  felt  that  the  extra  effort  was  not  warranted. 

A  cursory  examination  showed  that  the  average  slew  time  appeared  to  run  about 
4  or  5  seconds.  This  was  considered  to  be  a  reasonable  amount  of  time—neither 
too  great  nor  too  small.  The  actual  slew  times  ran  greater  than  the  theoretical 
slew  times  computed  from  the  maximum  slew  rate  when  spot  checks  were  performedi 
The  slew  time  can  be  considered  as  range-slew  time  since  azimuth  slevdng  was 
relatively  unimportant  in  this  test  series, 

f*  Effect  of  Altitude  Maneuver;  The  Data  Set  5  describes  the  effect 
of  diving  and  climbing  the  Ir-19  during  the  April  1957  series.  The  results 
given  therein  appear  to  be  at  variance  with  those  reported  in  prior  tests. 
However,  several  factors  must  be  considered.  First,  different  crews  were 
involved,  and  as  shown  previously,  the  effect  of  the  Height  Comparator  is 
closely  related  to  the  radar-crew  combination.  Second,  the  operators  were 
permanent  ESL  personnel  with  extensive  radar  experience.  It  was  clear  from 
the  beginning  of  the  test  series  that  these  radar  crews  possessed  more  skill 
and  proficiency  than  the  enlisted  men  used  for  the  prior  tests  mentioned 
above.  Third,  the  altitude  maneuver  in  the  April  1957  tests  corroborates 
the  conclusion  that  the  radar  crews  were  not  making  full  use  of  the  Height 
Comparator, 


TABIE  I;.  SHORT  SliWMART  OF  RESDITS 


109 


TABLE  2:  PERFORMANCE  ON  PJ-ANE  #2 

(RADAR  CREIi  1} 


FOR  FURTHER  DETI^IUS  SEE  DATA  SET  7 


Ill 


1U2 


IS  MADE  FOR  A  TIME  LAPSE  FACTOR 


m 

a. 

S  CO 

o  UJ 
o  e» 


o 


ac  z 
o 


e»  — 
a:  69 


o  rs 

3=  O' 


<t  o 


CO 


■ 


I 


MOD  I F ICAT lOH-CREW  I HTERACT  j  ON  ( U 1 


117 


pnfnfCTnTTf’XTTT^ 


HEIGHT  COMPARATOK 


Design  of  Ejqseiiments 
RESULTS  AND  CONCLUSIONS  s 


125 


a.  Table  1  indicates  the  acquisition  times  and  failure  rates  for  the 
various  radar  targets  or  aircraft  flown*  The  following  should  especially 
be  noted? 


(1)  The  use  of  the  Height  Comparator  resulted  in  a  significant 
reduction  in  the  transfer  failure  rate  for  the  aircraft  types  used  in  the 
tests , 


(2')  For  radar  target  or  plane  §3  the  transfer  failure  rate  of 
weapon  batteries  without  a  Height  Congjarator  was  inordinately  high.  In 
fact  18*3^  of  the  designations  resulted  in  failures^  (Acquisition  times 
over  32  seconds  were  considered  failures. 

(3)  The  use  of  the  Height  Comparator  also  resulted  in  a  signifi¬ 
cant  reduction  in  average  acquisition  time  of  the  aircraft  types  used  in 
the  tests. 

b.  It  was  apparent  during,  the  test  series  that  the  requirement  for 
multiple  operator  coordination  adversely  affects  target  acquisition. 

c.  It  was  apparent  that  after  a  brief  layoff  period,  the  target  track 
operators  performed  below  par  for  the  first  few  target  acquisitions.  This 
initial  failure  rate  is  extremely  important  when  defense  systems  are  opera¬ 
ted  tactically  and  subjected  to  surprise  raids.  This  was  evident  even  for 
periods  as  short  as  I8  hours, 

d.  The  data  indicate  that  the  radar-crew  combination  exerted  a  statis¬ 
tically  significant  effect  upon  acquisition  time,  and  must  be  considered, 
an  important  parameter  in  any  analysis  of  this  type.  An  interaction  was 
present  between  this  and  other  parameters  undergoing  analysis.  For  exam¬ 
ple,  the  effectiveness  of  the  Height  Comparator  in  acquiring  slow  aircraft 
often  depended  upon  the  radar-crew, 

e.  This  analysis  also  shows  the  importance  of  a  statistical  control, 
(such  as,  the  L-iy)  when  sets  of  data  are  separated  by  large  time  intervals. 
The  five  months  time  lapse  between  the  November  19^6  and  the  April  19^7 
flights  resulted  in  a  statistically  significant  increase  of  l.l|.  seconds 

in  average  acquisition  time.  This  time  lapse  played  an  inportant  role 
in  the  conparison  between  the  radar  targets,  planes  §2  and  #3. 

f .  The  experimental  design  required  a  large  number  of  replications  to 
guard  against  the  possibility  of  large  variances  for  each  treatment  combi¬ 
nation,  ..  This  fact  was  suggested  by  preliminary  tests  and  the  resulting 
analysis  showed  this  fact  to  be  true.  It  was  therefore  felt  desirable  to 
replicate  the  experiment  as  many  times  as  possible,  e,g,  at  least  a  dozen 
times.  This  feature  is  one  of  the  main  differences  between  this  experi¬ 
mental  design  and  others  described  in  the  literature.  In  spite  of  the 
large  vaiuances,  the  anal.ysis  was  able  to  proceed  to  a  successful  con¬ 
clusion  and  meaningful  results  because  of  the  large  number  of  replications. 


BECOM'ffiNDA.TIOMS 


In  the  light  of  the  findings  of  this  analysis,  the  following  recom¬ 
mendations  were  made  to  iii5)rove  target  acquisition:: 

a.  Install  the  automatic  Height  Comparator  (Height  Null  Meter),  or 
an  equivalent  device. 

h.  Initiate  a  program  of  daily  intensive  "on  the  site”  realistic 
training  for  operators, 

c.  Due  to  the  importance  of  operator  training,  procedures  should  be 
checked  and  revised  when  necessary  to  insure  optimum  operator  performance , 
An  independent  team  of  radar  experts  using  realistic  test  procedures  should 
select  a  system  and  test  the  performance  of  the  operator  personnel. 

d.  Where  large  time  lapses  occur  during  the  tests  on  defense  systems, 
a  statistical  control  should  be  utdlized.  In  the  test  series  described 
herein,  the  low-speed  1-19  array  aircraft  was  used  for  this  purpose. 

e.  In  the  design  of  future  tests,  and  in  the  statistical  reduction  of 
the  data,  the  effect  of  the  radar  crew  should  be  clearly  differentiated  from 
the  effects  of  the  other  variables  mdergoing  study, 

f .  A  study  should  be  made  of  the  acquisition  procedure  and  associated 
equipment  with  a  view  toward  simplif3dJig  and  reducing  the  mul’tti^perator 
coordination  requirements, 

ACKNDWLEIXaiENTS 


The  authors  wish  to  express  their  indebtedness  to  the  many  individuals 
assisted  in  the  pireparation  of  this  i*eport,  and  in  particular  to  the  Spec¬ 
ialist  Walter  LaMotte,  Pfc.  F.  Seltzer,  Martin  Orr,  to  the  Signal  Corps 
and  Air  Force  pilots,  to  the  engineers  who  designed  the  modification,  and 
to  the  many  radar  operators, 

SH)RT  GLOSSARY  OF  TEEMS 

Altitude  Maneuver,  Flying  the  L-19  aircraft  on  radial  and  tangentieil 
courses  in  such  manner  as  to  render  the  current  acquisition  independent 
of  the  previous  acquisition. 

Crew,  Radar-Crew,  Radar  and  Crew.  These  terms  are  used  in  their  widest 
meaning  to  denote  the  entire  raan*<aachine  corabiimtion. 

Height  Comparator  (Ht,  Coup,,  or  Modification).  A  null-type  meter  that 
conpares  two  voltages  or  c\zrrents;  one  voltage  represents  the  target 
height  as  seen  by  a  remote  source  such  as  an  operations  center,  and  the 
second  voltage  is  related  to  the  elevation  of  the  radar  track  antenna. 

Plan  Position  Indicator  (PPI) .  A  display  that  gives  azimuth  and  slant 
range  of  targets  on  the  circular  face  of  a  cathode  ray  tube.  It  is  some¬ 
times  called  a  polar  coordinate  or  time  base  display. 

Range  Slewing,  At  any  instant  of  time,  the  track  radar  is  examining  a 
particular  point  in  range.  If  the  tracking  radar  is  to  acquire  the  target, 
the  above  point  must  be  moved  inward  or  outward.  This  process  is  called 
slewing. 


EFFECTS  OF  BALLISTICS  AND  METEOROLOGICAL  VARIABLES 
ON  ACCURACy  OF  ARTILLERY  FERE 


0,  P.  Bruno 

Ballistic  Research  Laboratories 
Aberdeen  Proving  Ground,  Mayland 
U,  S,  Army  Ordnance 


1,  This  paper  presents  preliminary  information  on  an  exploratory  study  -which 
has  been  undertaken  to  investigate  some  of  the  ballistic  and  meteorological  para¬ 
meters  which  affect  the  accuracy  of  fire  with  artillery  weapons  systems.  It  is 

a  study  "which  has  been  in  progress  at  the  Ballistic  Research  Laboratories,  at 
Abei^ieen  Pro-ving  Ground  with  the  cooperation  of  the  Continen-tal  Army  Command, 
the  U«  S.  Army  Artillery  and  Missile  School  at  Ft.  Sill  and  the  Evans  Signal 
Labora-tories  of  the  U.  S.  Army  Signal  Corps. 

2,  With  the  development  of  a-bomic  artillery  and  related  tactical  concepts 
requiring  rela-tively  small  and  highly  mobile  combat  units,  renewed  enqjhasis  has 
been  placed  on  the  development  of  new  doctrine  and  capability  for  accurate  delivery 
of  both  atomic  and  conventional  artillery  fire.  With  atomic  artillery  particu¬ 
larly,  it  -would  be  highly  desirable  "bo  develop  a  capability  for  hitting,  -with  a 
high  probability,  a  -target  -with  the  first  round  fired.  Development  of  this  capa¬ 
bility  is  -1  mportant  to  fully  exploit  the  element  of  surprise.  The  effecti-veness 

of  the  element  of  s\irprise  is  considerably  reduced  under  the  cus-bomary  -bechniques 
of  adjustment  of  fire  and  registration  preliminary  to  firing-for-effect  on  a 
target.  However,  this  capability  is  difficult  to  achieve  because  of  the  many 
parameters  -which  contribu-be  -boward  inaccuracy  of  artillery  fire.  Some  of  these 
parameters  ares  (l)  in-berlor  ballistic  variations  in  muzzle  velocity  caused 
by  gun  tube  oonditionj  differences  in  propellant  weight,  temperature,  moisture 
and  other  characteristics j  shell  differences  in  weight  and  banding)  and  (2)  ex¬ 
terior  ballistic  effects  including  varia-bLon  in  shell  weight,  surface  finish, 
shape  and  s-bability  (sometimes  expressed  as  variation  in  ballistic  coefficient). 
Other  factors  include  meteorological  effects,  such  as  wind  velocity  and  direc¬ 
tion,  temperature,  and  density)  and  still  other  factors,  such  as  accurate  deter¬ 
mination  of  dis-bance  and  azimuth  "to  the  -barget, 

/ 

3,  Let  us  assume  that  -Uie  Firing  Battery  has  one  lot  of  ammunition  and  one 
gun.  Further,  that  they  have  calibrated  (l)  their  gun  tube  and  ammunition  (i,e,, 
they  have  a  good  estima-be  of  the  velocity  level  of  -bhe  gun  tubet-ammunition  com- 
bina-tion),  and  (2)  the  ballistic  coefficient  is  calibrated  (i.e,,  the  difference 
between  the  ballistic  coefficient  of  the  shell  lot  and  the  ballistic  coefficient 
assumed  in  the  Filing  Table  is  known  or  negligible), 

lio  Also  let  us  assume  that  the  Firing  Battery  has  meteorological  informa¬ 
tion  which  can  be  used  -to  estimate  the  effect  of  non-standard  meteorological  con- 
di-bion  on  range, 

5,  Let  xis  also  assume  that  the  distance  between  the  gun  position  and  the 
•barget  is  known  accurately. 


6,  Then  thd  error  involved  in  hitting  the  -target  -with  a  round  of  ammunition 
may  be  represented  bys  . 


2 


=  o,. 


C. 

1 


2,  AR  -.a 
''  AC  ^ 


+  o. 


e 


-M 

^Ae 


f-  + 


/  AR  „ 


\ 


128 


Design  of  Experiments 


2 

where  a„  is  the  variance  in  range  in  yards 
2 

Oy  is  the  variance  in  velocity  round  to  round  independent 
’'i  of  shell  weight  variations. 

‘AR 

(~)  is  the  differential  effect  in  range  for  a  xinit  change  in 
velocity, 

2 

°C.  is  the  variance  in  ballistic  coefficient  round  to  round 
independent  of  shell  weight  variations, 

(•^)  is  the  differential  effect  in  range  for  a  unit  change  in 
ballistic  coefficient, 

the  variance  in  metro  error  measurement  among 

occasions.  In  this  study  this  will  take  on  several 
values  as  we  consider  various  degrees  of  metro 
staleness, 

2‘ 

Oq  is  the  variance  in  gun  tube  angle  of  departure  upon 
firing, 

AR 

(~)  is  the  differential  effect  in  range  for  a  unit  change  in 
angle  departure, 

2 

is  the  variance  in  shell  weight, 

AR 

(^)  is  the  differentia  effect  in  range  for  a  unit  change  in 
weight. 

Through  past  studies  of  various  calibers  of  artillery  excellent  estimates  of  all 
of  the  coefficients  and  differential  effects  for  ballistic  parameters  are  avail¬ 
able,  However,  reliable  estimates  for  cr„  are  not  available.  The  purpose  of 

% 

this  study  is  to  obtain  estimates  of  these  parameters, 

7,  It  was  considered  probable  that  may  vary  systematically  depending 

upon  the  type  of  meteorological  conditions  encountered  on  a  day,  the  distance 
between  the  metro  station  and  the  firing  battery  and  the  staleness  of  t;he  metro 
data  (i.e,,  the  change  in  true  metro  conditions  between  the  time  the  metro  data 
was  taken  and  the  time  that  the  metro  information  was  used).  It  was  also  con¬ 
sidered  advisable  to  study  the  variation  attributable  among  metro  batteries. 


Design  of  Experiments 


129 

8.  The  types  of  experimental  design  adopted  for  this  study  were  somewhat 
dictated  by  the  nature  of  the  parameters  which  required  study  and  also  by  eco- 
iiOTO-C  and  logistic  considerations*  These  considerations  frequently  influence 
the  types  of  designs  which  can  be  adopted  where  relatively  large  scale  experi¬ 
mentation  is  involved.  In  this  particular  case  only  four  metro  batteries  were 
available  and  it  was  estimated  that  they  could  be  conveniently  located  at  dis¬ 
tances  of  approximately  1,  5,  10,  and  20  miles  from  the  selected  firing  position. 
While  metro  batteries  and  their  equipment  were  located  in  these  positions  it 
would  be  practical  for  them  to  take  metro  observations  at  0600,  0800,  1000  and 
1200  hours  in  a  full  day’s  work,  A  two  hour  interval  would  be  reasonable  for  them 
to  digest  the  data  and  develop  the  metro  message.  Consideration  of  the  above 
factors  precluded  the  random  selection  of  metro  staleness.  It  was  also  con¬ 
sidered  desirable  to  study  the  factor  of  staleness  independently  in  order  that 
could  take  on  various  values  depending  on  the  degree  of  staleness.  Hence# 


the  statistical  design  was  a  Latin  Square  with  two  replications  where  the  three 
factors,  days,  distances,  and  metro  batteries  were  studied  in  the  designs  for 
zero  hours,  2  hours,  4  hours,  and  6  hours  staleness  independently.  In  other 
words,  the  analysis  involved  4x4  latin  squares  with  two  replications  for 
each  of  the  conditions  of  staleness  Tinder  study  (0,  2,  4,  6  hours)  for  each  of 
the  two  weapons.  Since,  metro  data  was  developed  on  each  day  at  0600,  OSOO, 
1000  and  1200  hours  and  firings  were  conducted  at  0800  and  1200  hours  it  was 
possible  to  get  two  sets  of  latin  squares  for  each  of  zero  hours  staleness  and 
two  hours  staleness,  one  square  for  each  of  four  hours  and  six  hours  staleness. 


Distance  or  Location 


L 

DaysX, 

WBSaiSlM 

L,  (20  mi) 

4 

1 

A 

B 

C 

D 

1 

2 

B 

C 

D 

A 

3 

C 

D 

A 

8 

Jt _ 

D 

A 

B 

C 

- 1 

Distance  or  Location 

\  L 
DaysV 

L,  (20  mi) 

4 

5 

D  1 

A  1 

B 

C 

6 

B 

D  ' 

C 

A 

7 

A 

C 

D 

B 

8 

C 

B 

A 

D 

130 


Design  of  Experiments 


9.  The  firing  program  to  develop  the  necessary  information  to  serve  as  in¬ 
put  data  for  the  design  was  as  follows:  The  four  meteorological  batteries  were 
scheduled  for  occupation  of  the  four  different  positions  on  each  of  the  eight 
days  in  accordance  with  the  Latin  Square  Designs  indicated  previously.  The 
days  for  firing  were  selected  at  random.  The  Metro  Batteries  were  instructed 
to  take  meteorological  observations  with  the  Radiosonde  GMD-1  equipment  at  the 
hours  of  0600,  0800,  1000,  and  1200  each  day.  Field  Artillery  firing  batteries 
were  instructed  to  fire  two  artillery  weapons  (different  calibers)  at  0800  and 
1200  on  each  day.  These  firings  were  carried  out  with  rounds  from  two  selected 
lots  of  ammunition  representing  the  two  calibers.  The  sample  of  n  rounds  fired 
on  each  occasion  was  drawn  at  randan  from  the  lot.  For  each  caliber  the  charge 
and  the  quadrant  elevation  was  fixed  for  all  firings.  Three  range  observation 
posts  were  used  to  measure  the  range  of  each  round  fired.  Two  doppler  chrono¬ 
graph  units  were  used  to  measure  the  velocity  of  each  round  fired.  With  this 
information,  it  was  possible  to  compute  rather  accurately  the  range  to  each 
center  of  impact(corrected  for  velocity).  For  each  center  of  impact  it  was 
possible  to  detennine  the  actual  effect  of  the  existing  non-standard  meteoro¬ 
logical  conditions  as  opposed  to  the  estimated  effect  of  the  non-standard 
meteorological  conditions  as  conputed  from  the  meteorological  data  and  Firing 
Tables,  This  latter  value  represents  the  input  data  for  each  cell  in  the 
Latin  Square  .Design*,  therefore,  it  was  possible  to  obtain  input  data  for  each 
of  several  conditions  of  meteorological  staleness ^  namely,  0  hours,  2  hours, 

4  hours,  and  6  hours  of  meteorological  data  staleness.  For  example,  it  was 
possible  to  compute  the  estimated  meteorological  effect  for  non-standard  con¬ 
ditions  existing  at  O6OO  and  compare  this  with  the  actual  effect  on  range  of 
firings  performed  at  0800,  This  represents  a  condition  of  2  hours  staleness. 

The  difference  represents  the  error  in  estimating  the  effect  of  non-standard 
metro  conditions  and  is  used  as  input  data  in  this  analysis.  Similarly,  the 
firings  at  0800  and  1000  were  used  in  conjunction  with  the  meteorological 
data  for  O6OO,  0800,  1000  and  1200  hours  to  provide  the  input  information 
for  the  study. 

The  results  of  the  analysis  for  this  study  indicated  the  following: 

a.  The  among  Metro  Battery  differences  were  not  significant,  al¬ 
though  the  training  and  experience  of  the  personnel  of  the  meteorological 
batteries  varied  considerably. 

b.  That  for  the  conditions  existent  in  this  study  at  Ft.  Sill 
the  distances  between  the  field  position  and  the  location  of  the  meteorolo» 
gical  units  at  1,  5,  10,  and  20  miles  were  not  significsuit, 

c.  That  the  experimental  error  was  fairly  constant  for  all 
of  the  Latin  Square  Designs  and  that  for  oneweapon  it  was  about  38  yards, 
while  for  the  other  weapon  it  was  3I  yards  at  9800  yards  range, 

d.  The  component  of  variance  day  to  day  for  0  hours,  2  hours,  4 
hours,  and  6  hours  of  staleness  was  significant .and  increased  accordingly. 

Charts  No.  1  and  No,  2  for  the  two  weapons  under  study  show  the  relationship 
between  the  standard  deviation  in  range  Op  as  a  function  of  range  for  each  of 
the  ballistic  parameters  under  studyj  namely,  the  variation  due  to  velocity 


Design  of  Experiments 


131 


,  ballistic  coefficient  a 


(^)  * 


angle  departure  a_  /  AR-v  ,  and  shell 


weight  (^)  •  There  are  also  plotted  the  estimated  values  for  the  components 

of  variation  attributable  to  errors  of  estimation  of  the  effects  of  non-standard 
jneteorological  conditions  where  0,  2,  4,  and  6  hours  of  staleness  are  involved 

respectively.  Also  plotted  are  the  combined  estimates  for 

4  "6 

both  the  ballistic  and  meteorological  sources  of  variation  Up  ,  Op  ,  Ot,  ,  Op 

Hq  ^2  \  ^6 

for  0,  2,  4.  and  6  hours  of  staleness  respectively. 


\  \ 


•  ^D.  »  V* 


IVhile  this  study  is  relatively  limited  vdth  regard  to  the  number  of  types 
of  weapons,  the  topographical  location  and  to  the  one  range  at  which  firings  were 
performed  with  each  of  the  two  weapons,  it  is  believed  that  some  valuable  conclu¬ 
sions  can  be  drawn  within  the  framework  of  this  experiment, 

a.  That  the  equipment  for  measurement  of  meteorological  parameters 
such  as  wind  velocity,  wind  direction,  temperature,  and  density  does  not  con¬ 
tribute  appreciably  toward  meteorological  errors. 


b.  That  the  training  and  capability  of  Meteorological  Battery  personnel 
is  not  a  particularly  limiting  factor  in  developing  sufficiently  accurate  meteoro- 
lotical  information. 


c.  That  for  topographical  and  meteorological  areas  similar  to  Ft,  Sill 
the  distances  of  up  to  20  miles  between  the  firing  point  and  location  of  meteoro¬ 
logical  batteries  is  not  particularly  significant  or  important, 

d.  That  the  most  important  factor  is  meteorological  staleness.  When 
meteorological  data  of  0  staleness  is  used,  the  error  is  approximately  equivalent 
to  the  ballistic  errors  inherent  in  the  ammunition-gun  systems.  (Meteorological 
errors  and  the  ballistic  errors  contribute  approximately  equally  to  the  total 
range  error).  However,  it  is  recognized  that  it  is  not  physically  possible 
under  the  current  system  to  have  available  meteorological  data  for  0  hours 
staleness  since  an  appreciable  amount  of  time  is  required  for  the  reduction, 
dissemination,  and  use  of  the  meteorological  information, 

e.  That  the  use  of  meteorological  data  vxhich  is  2  hours,  4  hours,  or 
6  hours  old  contributes  appreciably  more  error  than  the  ballistic  errors.  It  was 
apparent  thAt  the  round  to  round  ballistic  errors  are  relatively  negligible  in 
comparison  to  the  errors  in  adjustment  for  the  effect  of  non-standard  metro  con¬ 
ditions  \^hen  2,  4  and  6  hour  stale  metorological  data  is  used, 

f.  It  is  apparent  that  the  development  of  capabilities  to  obtain,  re¬ 
duce,  disseminate,  and  use  meteorological  information  immediately  before  firings 
may  improve  the  accuracy  of  artillery  fire  and  contribute  appreciably  toward 
the  objective  of  increasing  the  probability  of  hitting  the  target  with  the  first 
round  fired. 


133 


SHELL  ,  H.  E.  Ml  ( DUALGRAN) 
105  MM  HOW.  CHGM 


RANGE  (YDS.) 


PRECEDING  PAGE  BLANK 


135 


SHELL,  H.E.  MI06 
8"  HOW,  CHG  3Z:  (G.B.) 


0  2000  4000  6000  8000  10,000  12,000 

RANGE  (YDS.) 


PRECEDING  PAGE  BLANK 


CHAEACTERISTICS  OF  VARIOUS  METHODS  OF  COLLECTING  DATA 
IN  TESTS  OF  INCREASED  SEVERITY 


A.  Bulfinch 
Picatinny  Arseneil 


PRECEDING  PAGE  BLANK 


SUMMARY.  The  need  for  a  better  understanding  of  the  characteristics  of 
methods  for  collecting  data  in  tests  of  increased  severity  is  described,  A 
number  of  problem  areas  in  Ordnance  research  are  listed  in  which  tests  of  this 
kind  are  required. 

Characteristics  of  various  standard  methods  and  some  of  their  modifications 
have  been  studied  by  Monte  Carlo  techniques.  The  results  of  sampling  known 
normal  and  skewed  distributions  are  evaluated. 


The  relation  of  tests  of  increased  severity  to  reliability  testing  is 
pointed  out, 

CONCLUSIONS .  1.  Of  the  methods  studied  only  two  are  of  general  interest; 

a.  The  up-and-down  method  is  most  useful  as  an  explo¬ 
ratory  method  in  new  situations  where  nothing  is  known  about  the  possible 
outcome.  The  original  version  of  the  method  will  converge  upon  the  region  of 
the  50^  point  with  the  least  possible  effort  regardless  of  where  on  the  stimu¬ 
lus  scale  the  test  is  started.  The  modifications  of  this  method  described 
will  converge  on  other  percentage  points  with  the  same  efficiency.  However, 
the  up-and-down  method  has  a  number  of  shortcomings. 

b.  The  run-down  method  is  the  most  versatile.  The 
original  version  can  accurately  determine  the  location  and  form  of  the  parent 
population  distribution.  Modifications  described  are  completely  distribution 
free  and  can  be  used  in  the  extreme  tails  of  the  curves  for  determining  such 
things  as  safety  and  reliability, 

2.  The  remaining  methods  are  of  little  value  except  in 
highly  specialized  cases.  Taken  alone  these  methods  tell  us  nothing  about 
the  parent  population  sampled. 

INTRODUCTION.  To  appreciate  the  need  for  studying  the  characteristics 
of  methods  for  collecting  data  in  tests  of  inc]^ea6ed  severity  one  must  know 
something  about  the  following; 

1.  The  nature  of  the  tests  in  which  these  methods  are  used. 

2.  The  kind  of  problem  in  which  these  tests  are  useful, 

5.  The  frequency  with  which  problems  requiring  these  tests  occur  in 
Ordnance  research. 


We  all  know  that  if  we  strike  an  explosive  hard  enough,  it  will  detonate. 
If  we  are  careful  we  can  strike  it  lightly  without  detonating  it.  Something 
like  this  is  also  true  of  delicate  instnanents.  If  we  strike  the  instrument 
hard  enough  we  will  destroy  it.  But  if  we  treat  it  carefully  it  will  operate 
as  intended.  Finding  out  what  happens  in  between  these  two  extremes  (  of 
mechanical  shock)  is  the  objective  of  tests  of  increased  severity.  These 
tests  are  intended  to  determine  how  much  stimulus  (in  various  forms  such  as 
mechaniceil  shock)  is  required  to  cause  a  given  response  frequency  such  as 


138 


Dfesign  of  Experiments 


the  detonation  frequency  of  an  explosive  or  failure  frequency  of  an  instrument 
Biis  concept  of  "how  much  stimulus  is  required  to  cause  a  response"*  has  come 
to  be  referred  to  as  "sensitivity"  to  certain  stimuli  such  as  mechanical  shock 
electric  energy,  temperature,  acceleration,  etc.  The  methods  used  to  collect 
data  in  these  tests  are  now  called  sensitivity  methods  and  the  data  collected 
is  called  sensitivity  data  for  explosives  and  reliability  data  for  missile 
components  and  other  instruments. 

Many  problems  requiring  tests  of  increased  severity  for  their  solution 
are  of  long  standing  but  are  not  recognized  as  such  because  of  their  statis¬ 
tical  nature.  Further  difficulty  lies  in  the  fact  that  sensitivity  and 
reliability  data  differ  from  other  data  in  some  respects.  First,  sensitivity 
and  reliability  data  are  binomial  in  nature.  That  is,  there  are  only  two 
possible  outcomes,  success  or  failure.  Secondly,  the  observed  data  usually 
form  a  ciunulative  frequency.  That  is,  the  frequency  of  successes  (  or  fail¬ 
ures)  obtained  at  any  given  level  of  stimulus  is  an  estimate  of  the  sum  of 
all  the  success  (or  failure)  frequencies  up  to  that  stimulus  level.  As  a 
consequence  an  tmderstanding  of  frequency  distributions  and  probabilities 
(relative  frequencies)  is  required  to  interpret  the  data  (Ref  9). 

Tests  of  increased  severity  are  usually  required  when  the  following 
question  arises;  "At  what  level  of  stimulus  should  the  test  be  conducted?" 
From  a  statistical  point  of  view  the  answer  is  "At  the  503^  point".  Then  the 
question  immediately  arises  "How  can  the  point  be  found?"  This  is  a 
problem  for  tests  of  increased  severity  using  methods  such  as  the  up-and- 
down  method,  the  run-down  method,  and  the  two-stimuli  method  described 
later  in  this  report. 

Hov;ever,  the  Ordnance  research  engineer  is  not  always  interested  in  the 
505^  point.  He  is  not  interested  in  explosives  that  detonate  50^  of  the  time 
or  instnunents  that  function  305^  of  the  time.  From  a  safety  standpoint  he 
is  interested  in  determining  the  maximum  stimulus  that  can  be  used  without 
causing  a  single  detonation  from  the  explosive.  From  a  reliability  stand¬ 
point  he  is  interested  in  the  maximum  stimulus  that  can  be  used  without 
causing  a  single  failure  in  the  instrument.  As  a  result  sensitivity  methods 
have  been  developed  for  estimating  points  on  the  cumulative  frequency  curve 
other  than  the  30^  point.  The  Picatinny  Arsenal  method  and  the  first-fire- 
point  method  described  later  are  two  of  these. 

In  this  regard  it  would  be  interesting  to  apply  the  theory  of  extreme- 
value  distributions  to  interpret  sensitivity  and  reliability  data.  This 
approach  has  not  been  used  in  this  study. 

Methods  for  collecting  data  in  tests  of  increased  severity  are  required 
in  a  wide  variety  of  Ordnance  research  problems.  For  example,  some  type  of 
test  of  increased  severity  is  required  to  collect  useable  data  in  each  of  the 
following  problem  areas; 


♦"Response"  can  be  defined  as  an  explosive  detonation  or  a  missile 
component  failure. 


139 


Design  of  Experiments 

1.  Mechanical  shock  sensitivity. 

a.  Impact  tests  of  high  explosives. 

b.  Impact  tests  of  artillery  fuzes, 

c.  Missile  components, 

d.  Izod  impact  test  of  metals. 

e.  Izod  impact  test  of  plastics, 

f.  Impact  or  drop  test  of  packing  cases. 

2.  Sensitivity  to  setback  pressures  of  high  explosives. 

3.  Acceleration  sensitivity  of  missile  components. 

4.  Eriction  sensitivity  of  explosives. 

5.  Velocity  sensitifity  of  fuzes  and  explosives. 

6.  Voltage  sensitivity  of  fuzes  and  missile  components, 

7*  Spark  sensitivity  of  pyrotechnic  materials. 

8.  Temperature  sensitivity  of  explosives  and  missile  components. 

In  each  one  of  these  areas  if  observations  are  taken  over  the  full  range 
of  responses  from  zero  to  1005^  failures  (or  successes)  y  it  vd.ll  be  found  that 
the  data  form  a  signoid  cumulative  frequency  curve.  Even  when  testing  to 
failure  in  reliability  work  (lief  10,  the  frequency  of  failures  will  form  a 
sigmoid  curve.  As  a  result  accurate  reliability  statements  can  only  be  made 
when  the  cumulative  frequency  percentage  point  associated  with  the  stimulus 
level  used  is  known.  This  percentage  point  can  be  determined  only  by  using 
sensitivity  methods  such  as  those  described  below. 

A  search  of  the  literature  shows  that  much  has  been  written  on  methods 
for  tests  of  increased  severity.  But  most  of  the  work  has  been  done  with 
pure  mathematics  or  with  actual  materials  and  equipment.  Many  of  the  math¬ 
ematical  treatments  have  been  found  to  be  impractical.  Experiments  using 
actual  materials  contain  so  many  \incontrolled  variables  that  the  true  char¬ 
acteristics  of  the  data-collecting  methods  are  distorted. 

The  object  of  the  work  reported  here  is  to  determine  the  true  charar- 
acteristics  of  several  of  the  available  sensitivity  methods  and  some  of  their 
modifications  through  the  use  of  the  Monte  Carlo  procedure  of  sampling.  It 
is  expected  that  this  procedure  will  reveal  the  true  characteristics  of  these 
methods  better  than  previous  approaches,  and  suggest  the  need  for  new  techniques 
for  solving  present  day  problems.  Methods  are  required  which  will  give  unbiased 
estimates  of  the  true  population  means  and  variances.  It  is  believed  that  the 
theory  of  extreme-value  distributions  will  be  useful  in  this  effort. 


3l4o  Design  of  Experiments 

In  Monte  Carlo  procedures  it  is  atssumed  that  all  controlled  experiments 
have  the  following  two  characteristics  in  common: 

1.  A  set  of  experimental  conditions  (in  the  physical  sense)  is  specified. 
This  defines  the  underlying  distribution  and  its  parameters  that  would  be 
formed  if  an  infinite  number  of  observations  were  taken  under  that  set  of 
conditions. 

2.  The  order  of  occurrence  of  the  observed  data  is  always  random  if  no 
effort  is  made  to  bias  the  data. 

From  these  assumptions,  simulated  experiments  can  be  conducted  as  follows; 

1.  Choose  a  known  distribution  which  can  be  considered  as  representing 
the  distribution  defined  by  the  particular  set  of  e3q>erimental  conditions 
under  investigation. 

2.  Sample  this  distribution  using  a  set  of  random  numbers. 

Simulated  experiments  of  this  kind  have  the  following  advantages: 

1,  They  eire  cheap  to  conduct. 

2.  They  are  more  practical  than  many  mathematical  approaches, 

3i,  They  are  free  of  the  usual  errors  encountered  in  handling  materials 
and  equipment. 

4.  Reliable  estimates  of  parameters  (true  values)  are  economically 
obtained, 

5.  Known  distributions  can  be  sampled, 

6.  They  can  be  used  to  confirm  the  validity  of  mathematical  models. 

Characteristics  of  the  following  methods  have  been  studied  to  date: 

1,  Original  Picatinny  Arsenal  methods  (Ref 

2,  First  modification  of  Picatinny  Arsenal  method, 

5*  Second  modification  of  Picatinny  Arsenal  method, 

4,  First- fire  point  (Ref  2), 

5*  First- failure  point, 

6,  TJp-and-dovm  method  (Refs  1  and  6). 

a.  Recommended  grouped  data  calculation  (Refs  1  and  6). 

b.  Usual  grouped  data  calculation. 


Design  of  Experiments 


I4l 


c.  Tiiro-failure  modification, 

d.  Three-feiilure  modification, 

e.  Ten-failure  modification, 

f.  Fifteen-failure  modification, 

g.  Two-success  modification, 

h.  Three-success  modification, 

i.  Ten-success  modification, 

7,  Run-down  method  (Ref  4) , 

8,  Two-stimuli  method  (Ref  3)« 

evaluation  of  methods t 

Original  Pica  tinny  Method.  This  method  (sRef  3)  starts  high  on  the 
stimulus  scale.  If  a  success*  is  obtained  the  next  lower  (one  increment 
lower)  stimulus  level  is  used  for  the  next  trial;  if  a  failure  is  obtained 
the  same  stimulus  level  is  used  for  the  next  trial.  New  specimens  are  used 
for  each  trial.  This  procedure  is  repeated  until  a  stimulus  level  is  found 
at  wMch  10  successive  failures  are  obtained.  The  next  higher  stimulus 
level  is  taken  as  the  result.  Why  this  level  is  taken  as  the  sensitivity 
value  is  not  known#  53tie  precision  of  this  procedure  is  very  poor.  As 
shown  in  Table  II**,  single  determinations  (from  one  series  of  trials) 
must  differ  by  at  least  6,5  tuaits  before  the  difference  can  be  declared 
significant. 

Modified  Pica tinny  Method,  The  first  modification  of  the  Picatinny 
method  used  was  to  take  the  height  at  which  10  successive  failures  were 
obtained  as  the  result.  The  precision  of  this  modification  is  exactly  the 
same  as  that  of  the  original  method.  However,  the  percentage  point  measured 
comes  a  little  closer  to  the  10$^  point,  which  is  the  point  the  method  has 
been  assumed  to  determine  (Table  II).'  This  is  a  very  unfortunate  percentage 
point  at  which  to  make  comparisons  of  explosives,  Fbrmer  work  (Ref  7)  has 
shown  that  Comp  B,  EDX,  tetryl,  and  TNT  all  have  the  same  1%  point.  This 
makes  the  very  small  differences  sunong  these  standard  explosives  at  the  10^ 
point  practically  indistinguishable. 


♦"Success"  is  defined  as  an  explosive  detonation  or  a  missile  component 
failure, 

♦♦The  Tables  have  been  placed  at  the  end  of  this  article. 


142 


Design  of  Experiments 


The  second  modification  of  the  Picatinny  method  is  the  same  as  the  first 
modification  except  that  the  stimulus  at  which  15  successive  failures  are  ob¬ 
tained  is  taken  as  the  result.  This  improves  the  precision  somewhat  but  not 
to  an  acceptable  extent.  Table  II  shows  that  two  single  sets  of  determinations 
must  differ  by  at  least  5.1  units  before  the  difference  can  be  declared  signif¬ 
icant.  This  means  that  at  least  26(5.1  squared)  sets  of  trials  must  be 
conducted  and  averaged  before  a  difference  of  one  unit  can  be  declared  signif¬ 
icant. 

First-Fire  Point.  The  first-fire  point  (Bef  2)  starts  low  on  the  stimulus 
scale.  If  a  failure  is  obtained  the  next  higher  stimulus  level  is  used.  The 
stimulus  at  which  the  first  fire  (e:q)losive  detonation  or  missile  component 
failure)  is  obtained  is  taken  as  the  result.  This  procedure  has  even  poorer 
precision  (Table  II)  than  the  Picatinny  method  and  its  modifications.  This 
would  be  expected  since  the  sample  size  used  is  smaller.  However,  the  first- 
fire  point  method  may  be  useful  in  reliability  testing  in  situations  where 
reasonable  stimulus  increments  can  be  established. 

First-Failure  Point.  The  first-failure  point  starts  high  on  the  stimulus 
scaled If  a  success  (detonation)  is  obtained  the  next  lower  stimulus  level 
is  used.  The  stimulus  at  which  the  first  fedlure  is  obtained  is  taken  as  the 
result.  The  precision  of  this  method  is  also  unacceptable  for  e2Q>loslves  work. 
It  is  similar  to  the  first-fire  point  in  this  respect  (Table  II). 

Repeated  determinations  by  any  one  of  the  above  methods  tell  us  nothing 
about  the  magnitude  of  the  parameters  of  the  parent  population  which  we  are 
striving  to  measure.  This  can  be  seen  from  Table  II  by  comparing  the  averages 
and  standard  deviations  obtained  with  the  known  parameters  (/as  20;tf'=  5)  of 
the  norm^  distribution  sampled.  However,  any  two  of  these  methods  used  to¬ 
gether  will  give  two  points  on  the  cumulative  frequency  curve  of  the  parent 
population.  The  further  apart  these  points  are,  the  more  accurate  the  deter¬ 
mination  of  the  curve.  If  these  points  are  plotted  on  probability  paper  the 
average  (50^)  point  and  standard  deviation  (slope  of  the  line)  of  the  parent 
population  can  be  obtained  by  graphical  methods  within  the  precision  of  the 
method  and  sample  size  used.  This  approach  to  estimating  the  parameters  is 
valid  for  normal  distributions  and  distributions  of  known  form  only. 

The  converse  of  the  Picatinny  methods  could  be  used  to  estimate  a  point 
on  the  curve  in  the  region  of  the  $0^  point.  But  the  precision  would  be  ex¬ 
pected  to  be  similar  to  that  of  the  Picatinny  methods  described  above.  This 
modification  of  the  Picatinny  method  was  not  included  in  the  present  study 
since  modifications  of  the  up-and-down  method  (described  below)  can  measure 
both  the  10^  and  90^^  regions  of  the  curve.  These  modified  up-and-down  methods 
appear  to  be  superior  to  the  Picatinny  methods  for  the  reasons  stated  below 
under  the  discussion  of  the  modified  up-and-doiim  methods. 

The  Pp-and-Down  Method.  The  unmodified  up-and-down  method  (Refs  1  and  6), 
starts  any  place  on  the  stimulus  scale.  If  a  success*  is  obtained,  the  next 
lower  stimulus  level,  one  increment  below  the  first,  is  used  for  the  next 
trial.  If  a  failure  is  obtained,  the  next  higher  stimulus  level  is  used  for 
the  next  trial.  The  stimulus  levels  must  be  equally  spaced  at  intervals  equal 
to  the  standard  deviation.  This  procedure  is  repeated  throughout  the  test. 


♦’’Success"  is  defined  as  an  explosion  detonation  or  a missile  component 
failure . 


Design  of  Experiments 


143 


This  method  is  a  good  exploratory  procedure  for  finding  the  region  of  the  50^ 
point.  Even  when  nothing  is  known  about  the  possible  location  of  the  505^  poir 
the  up-and-down  method  will  converge  on  this  region  with  a  minimum  number  of 
tricQ-s  regardless  of  where  on  the  stimulus  scale,  the  testing  is  started. 

The  exploratory  nature  of  the  up-and-down  method  makes  it  very  valuable 
in  new  situaitons  where  nothing  is  known  of  the  possible  outcomes.  Whereas 
the  original  method  will  converge  on  the  region  of  the  50^  point,  modified 
up-and-down  methods  can  be  made  to  seek  out  the  region  of  other  percentage 
points.  Seven  modifications  of  this  method  are  listed  in  Table  V. 

Modified  Up-and-Down  Methods.  The  two-failure  modification  means  that 
two  successive  failures  are  required  before  going  to  the  next  hi^er  stimulus 
level.  Only  one  success  is  required  before  going  to  the  next  lower  stimulus 
level.  The  other  •'failure*'  modifications  were  conducted  in  a  similar  manner 
using  the  indicated  munber  of  successive  failures  before  going  to  the  next 
higher  stimulus  level.  These  modifications  force  the  observations  to  converge 
on  stimulus  levels  somewhat  lower  than  the  average  of  the  pcurent  population. 

In  the  "success."  modifications  two  or  more  consecutive  successes  are 
required  before  going  to  the  next  lower  level.  Only  one  failure  is  required 
before  going  to  the  next  higher  stimulus  level.  These  modifications  force 
the  observations  to  converge . on  stimulus  levels  somewhat  higher  than  the 
average  of  the  parent  population. 

The  percentage  points  shown  in  l&ble  V  were  obtained  by  using  ihe  equatic 
of  the  normal  ctunulative  frequency  ciirve  as  follows: 

^  =/Ui 

where;  =  observed  mean, 

yc-c  =  population  mean. 

•  t  =  normal  deviate. 

0  =  population  standard  deviation. 

This  equation  was  solved  for  "t"  and  the  area  -under  the  normal  curve 
associated  with  the  calculated  T-value  was  found  from  a  -table  of  areas 
under  the  normal  curve.  These  areas  are  the  percentage  points  listed  in 
Table  V. 

In  a  new  situation  a  combination  of  one  "success"  and  one  ••failure" 
modification  of  the  up-and-down  method  can  be  used  to  determine  two  points 
on  the  cumulative  frequency  curve  of  the  parent  population.  From  this, 


Design  of  Expeilments 


Ikk 

reasonable  estimates  of  the  population  mean  and  standard  deviation  can  be 
obtained,  if  the  form  of  the  distribution  sampled  is  known.  The  nature  of 
these  modifications  (like  the  unmodified  method)  is  such  that  the  required 
percentage  point  regions  can  be  found  with  a  minimum  of  effort.  The  effect 
of  not  knowing  the  mgnitude  of  the  standard  deviation  in  new  situations  may 
require  some  repetition  to  refine  the  measurements,  since  for  best  results 
the  increments  used  in  the  up-and-down  method  should  be  of  the  order  of  the 
population  standard  deviation. 

These  modifications  of  the  up-and-down  method  are  preferred  to  the 
Picatinny  method  or  first-fire  and  first  failure  methods  because  the  up- 
and-down  methods  are  more  efficient.  The  method  of  conducting  the  up-and- 
down  procedures  is  better  defined  and  easier  to  follow  consistently 
without  wasted  effort. 

However,  the  up-and-down  method  has  its  limitations. 

For  example : 

1.  We  have  the  incongruous  situation  in  which  sampling  a  normal  cumula¬ 
tive  frequency  with  the  up-and-down  procedure  forms  a  frequency  distribution 
that  is  neither  cumulative  nor  symmetrical  (Table  VII),  lifty- three  per  cent 
of  the  area  \mder  the  curve  of  this  distribution  is  below  the  mean.  Therefore 
it  can  be  said  to  have  a  slight  positive  skewness,  which  tends  to  give  slight¬ 
ly  low  estimates  of  the  mean  (Table  III).  Taking  the  log  of  the  stimulus 
units  over-compensates  for  this  bias.  The  fact  that  the  observed  frequency 

is  not  cumulative  raises  the  question  of  whether  the  stimulus  level  used 
should  be  considered  the  midpoint  of  the  grouped  data  cell.  It  is  clear  that 
the  form  of  the  parent  distribution  being  sampled  is  the  cumulative  frequency 
since  increasing  the  stimulus  level  can  only  increase  (or  decreeise)  the  fre¬ 
quency  of  a  response.  The  frequency  cannot  rise  to  a  maximum  and  then  decrease. 
If  the  expected  distribution  is  in  the  form  of  a  cumulative  frequency then  the 
stimulus  levels  used  must  be  the  grouped  data  cell  maxima.  In  spite  of  the 
fact  that  the  observed  frequency  obtained  by  means  of  the  up-and-down  method 
is  not  cumulative,  the  stimulus  levels  used  must  be  considered  the  cell  maxima 
in  order  to  obtain  reasonable  estimates  of  the  true  mean  when  positive  re¬ 
sponses  are  used.  When  negative  responses  are  used  the  stimulus  levels  must 
be  considered  the  cell  minima.  The  data  recorded  for  the  up-and-down  method 
in  Tables  III  through  VII  have  been  obtained  in  this  manner.  That  is,  one- 
half  the  cell  width  has  been  eirbitreirily  subtracted  from  the  stimulus  levels 
when  using  positive  responses  and  added  to  the  stimulus  levels  when  using 
negative  responses. 

2.  Slightly  biased  estimates  of  the  mean  are  obtained  when  the  popula¬ 
tion  distribution  sampled  is  skewed  (T&ble  VI).  These  biases  are  in  the 
direction  of  the  median. 

3.  The  population  standard  deviation  is  poorly  estimated  even  when  the 
population  sampled  is  normally  distributed.  The  observed  sample  standard  de¬ 
viation  is  significantly  less  than  the  population  standard  deviation  (Tables 
III,  V  and  \0:).  The  usual  formula  (Kefs  1  and  6)  for  the  standard  de'^ation 
associated  with  the  up-and-dovm  method  gives  a  modified  mean  square  rather 
than  a  standard  deviation. 


Design  of  Experiments 


1^5 


4.  This  method  requires  the  stimulus  level  to  be  changed  after  each 
trial,  which  may  be  impractical  in  situations  where  changing  the  level  is 
complicated  or  where  the  responses  are  not  immediately  available. 

5.  The  stimulus  used  must  be  accurately  controllable  at  predetermined 
levels  so  that  the  stimulus  levels  are  equally  spaced  at  intervals  equal  to 
the  standard  deviation. 

A  further  difficulty  with  the  up-and-down  method  is  the  restrictions 
placed  on  the  observations  by  the  sampling  procedure.  The  conditions  under 
which  each  observation  (except  the  first)  is  taken  are  dependent  upon  the 
outcome  of  the  previous  observation.  As  a  result  all  of  the  observations 
are  concentrated  in  the  central  region  of  the  curve  (Table  VII  for  normal 
distribution).  The  probability  of  an  observation  being  as  far  as  two  stan¬ 
dard  deviations  from  the  mean  in  either  tail  of  a  normal  distribution  is 
2.28  times  per  hundred.  From  Table  VII  (for  normal  distribution)  it  can 
be  seen  that  at  two  standard  deviations  from  the  mean,  1^79  observations 
with  the  up-and-down  method  gave  no  observation  in  the  upper  tail  and  only 
one  observation  in  the  lower  tail.  If  the  observations  were  random,  they 
would  be  in  proportion  to  the  frequency  distribution  of  the  population  sam¬ 
pled.  Since  J;he  condition  of  collecting  data  permits  observations  aroimd. 
the  505a  point  only  but  not  in  the  tails,  the  samples  obtained  cannot  be 
considered  representative  of  the  population.  This  is  reflected  in  the 
biased  standard  deviation  obtained.  It  can  therefore  be  concluded  that 
the  data  obtained  with  the  up-and-down  method  is  neither  independent  nor 
random  and  is  not  representative  of  the  parent  population  sampled. 

Run-Down  Method.  In  this  method  (Ref  4)  a  given  niunber  of  successive 
trials  are  made  at  each  stimulus  level  used.  The  stimulus  levels  are  arranged 
to  cover  the  entire  reinge  of  responses  (from  zero  to  1005^)  in  a  convenient 
number  of  increments.  The  size  of  the  increments  and  the  number  of  trials 
used  at  each  stimulus  level  cain  be  varied  to  accomplish  the  intended  purpose. 

To  obtain  reasonable  precision  in  the  tails,  the  increments  used  in  the  tails 
of  the  curve  should  be  smaller  and  the  number  of  trials  used  at  each  stimulus 
level  should  be  larger  than  those  used  in  the  region  of  the  50?^  point. 

The  sampling  procedure  of  the  run-down  method  does  not  require  a  knowledge 
of  the  outcome  of  the  previous  observation  in  order  to  determine  the  condition 
under  which  the  next  observation  will  be  taken.  That  is,  the  outcome  of  one 
observation  does  not  affect  any  other.  Therefore  it  can  be  said  that  the  ob¬ 
servations  are  independent.  There  is  no  restriction  in  this  method  as  to 
where  the  observations  are  taken.  The  stimulus  levels  used  can  be  picked  at 
random  anywhere  on  the  curve.  Once  a  level  is  chosen  the  observations  are 
completely  unrestricted,  and  therefore  occur  at  random.  Since  observations 
are  taken  over  the  entire  response  range  (zero  to  10095)  and  occur  at  random, 
their  relative  frequencies  will  be  proportional  to  those  of  the  parent  popu¬ 
lation  and  will  therefore  be  representative  of  the  parent  population  (Table  VIII). 

Because  the  data  obtained  with  the  run-down  method  are  independent  and 
random  and  represent  the  population  sampled,  this  method  has  the  following 
advantages ; 


l46  Design  of  Experiments 

1.  The  form  of  the  distribution  sampled  can  be  determined  (Table  VIII). 

2.  Unbiased  estimates  of  the true  mean  are  obtained  even  when  the 
distribution  sampled  is  skewed  (Table  VI). 

3.  Unbiased  estimates  of  the  true  variance  are  obtained  when  the 
distribution  sampled  is  normal  (Table  III). 

4.  The  validity  with  which  statistical  techniques  requiring  the  assump¬ 
tion  of  normality  are  used  can  be  evaluated, 

5.  The  acceptability  of  the  new  products  or  new  treatments  of  old  productj 
can  be  based  upon  the  form  of  the  distribution  as  well  as  the  mean  and  varianc* 

Additional  favorable  characteidstics  of  the  run-down  method  are  as  follows; 

1.  Basic  i*ules  of  statistical  theory  are  followed, 

2,  Observed  data  form  a  cumulative  frequency  as  expected  (Table  VIII). 

3,  The  method  is  useful  in  a  variety  of  practical  situations  since  once 
a  stimulus  level  has  been  established  a  number  of  observations  are  taken  at 
that  level, 

4.  Comparison  of  two  or  more  materials  or  items  can  be  made  at  any  given 
stimulus  level  using  chi-square  tests  of  significance  without  any  assumption 
concerning  the  form  of  the  distributions.  This  modification  of  the  method  is 
especially  useful  when  the  comparison  of  interest  occurs  in  the  extreme  tails 
of  the  curves,  such  as  when  measuring  the  safety  of  an  explosive  or  the  relia¬ 
bility  of  a  missile  component, 

5*  Prior  knowledge  of  the  magnitude  of  the  population  standard  deviation 
is  not  required. 

The  major  disadvantage  of  the  run-down  method  is  the  fact  that  a  relativelj 
leirge  sample  size  (total  number  of  trials)  is  required  to  determine  the  cumu¬ 
lative  frequency  ciirve  over  its  entire  length.  However  this  disadvantage  is 
tempered  by  the  following; 

1,  This  is  the  only  method  which  can  accurately  estimate  the  cumulative 
frequency  curve  over  its  entire  length. 

2,  If  the  exact  character  of  the  entire  cumulative  frequency  curve  is 
not  required,  the  one-stimulus  (described  above)  or  two-atimuli  modification 
(described  below)  can  be  used. 

An  additional  disadvantage  of  the  rxm-down  method  is  the  fact  that  biased 
estimates  of  the  standard  deviations  are  obtained  when  the  distributions 
sampled  are  skewed  (Table  VI). 


Design  of  Experiments 


14? 


Two-Stimuli  Method,  The  two-stimuli  method  (Ref  3)  is  a  modification  of 
the  run-down  method.  Instead  of  using  several  stimulus  levels  to  cover  the 
range  of  responses,  only  two  stimulus  levels  are  used — thus  the  name.  *rh-ia 
method  is  useable  only  when  the  assumption  of  normality  is  valid  (or  the  form 
of  the  distribution  is  known)  and  when  the  response  frequencies  obtained  from 
the  two  stimulus  levels  differ  by  as  much  as  20^. 

The  advantages  of  this  method  under  the  restrictions  mentioned  above  are 
as  follows; 

1.  It  is  simple  to  conduct  and  simple  to  calculate. 

2.  It  uses  relatively  smaill  sample  sizes* 

5*  It  gives  imbieised  estimates  of  the  mean  and  standard  deviation. - 

The  disadvantages  of  this  method  are  as  follows: 

1,  It  is  sensitive  to  deviations  from  the  eissumed  form  of  the  distri¬ 
bution. 

2.  It  requires  some  previous  knowledge  of  the  location  of  the  ciunula- 
tive  frequency  cuarve  to  be  sampled. 


148 


TABLE  I 


Known  Cumulative  Dietributions  Used  in  the  Monte  Carlo  Sampling  Experiments 

True  Mean  =  20,0 
True  Standard  Deviation  =  5*0 


Distributions  Sampled 

(Area  Under  Curve,  %) 

Stimulus  - 

Std  Dev 

Normal 

Positively 

Negatively 

Levels 

Units 

Curve 

Skewed 

Skewed 

55.0 

3.00 

99 

32.5 

2.50 

99 

98 

30.0 

2.00 

98 

96 

27.5 

1.50 

93 

92 

99 

25.0 

1,00 

84 

85 

86 

22.5 

0.50 

69 

74 

64 

20.0 

0.00 

50 

57 

43 

17.5 

-0.50 

31 

36 

26 

15.0 

-1.00 

16 

14 

15 

12.5 

-1.50 

7 

1 

8 

10.0 

-2.00 

2 

4 

7.5 

-2.50 

1 

2 

5.0 

-3.00 

1 

a 


Hie  cell  maxima 


149 


TABLE  II 


Characteristics  of  Various  Methods  for  Collecting 
Sensitivity  Data  Sampling  A  Normal  Distribution 

True  Mean  =20.0 

True  Standard  DeTiation=  ^»0 

■P 

c 

•H 

O 

o. 


Method 

Incre¬ 

ment 

Total  No. 
of  Triads 

Sample 

Size  Used 

Percentage. 

Measured 

Average 

Standard 

Deviation 

cdi 

0 

2 

PA  (original)- 

0/5 

2240 

80 

15.5 

14.5 

2.29 

6.5 

PA  Modified  1- 

0/5 

2240 

80 

11.8 

13.5 

2.29 

6.5 

PA  Modified  2- 

a/5 

2548 

52 

5.6 

11.8 

1.78 

5.1 

First-Fire  Point 

0/5 

1053 

117 

25.9 

i6.2 

5.U 

8.7 

First-Failure  Point 

0/5 

1044 

ll6 

70.7 

25.0 

2.55 

7.1 

”  Least  significant  difference  (for  single  determinationa)  to  oompar©  two  results. 

b 

-  The  stimulus  level  one  inch  above  the  height  at  which  10  successive  failures 
are  obtained  is  used  as  the  average. 

-  The  stimulus  level  at  which  10  successive  failures  are  obtained  is  used  as 
the  average. 

-  The  stimulus  level  at  which  15  successive  failures  are  obtained  is  used  as 
the  average . 


150 


TABLE  III 


Comparisons  Using  Large  Sample  Sizes 
Sampling  A  Normal  Distribution  vrith  True  Mean  =  20.0 
True  Standard  Deviation  =  5»0t  Increment  =  <J/2 


Total 


Methods 

No.  of 
Trials 

Sample 

Size 

Average 

Standard 

Deviation 

Up-and-Down 

1,  Calc  as  in  Ref  1 

a.  Successes 

b.  Failures 

2700 

2700 

1350  . 
1350 

19.8 

19.8 

4.88^ 

4.92- 

2.  Standard  Grouped 
data  calculated 

a.  Successes 

b.  Failures 

2700 

2700 

1350 

1350 

19.8 

19.8 

2.71 

2.68 

Run-Down 

1,  Standard  Grouped 
data  calculated 

a.  Successes 

b.  Failures 

7200 

7200 

800 

800 

19.9 

19.9 

4.75 

4.80 

Two-Stimuli 

1.  Calc  as  in  Ref  3 

a.  Successes 

b.  Failures 

1600 

1600 

800 

800 

20.0 

20.0 

5.03 

5.03 

~  These  values  are  actually  modified  mean  squares  rather  than  standard  deviations. 


Xi  I 


151 


TABLE  IV 


Reproducibility  of  Method  Results 
Sampling  A  Normal  Mstribution 

True  Mean  =  20,0 

True  Standard  Deviation  =  5.0 

_ Methods _ _ 

Up-and-Dovm-  Run-Down-  Two-Stimuli- 


Replicate 

Mean 

Std  Dev 

Mean 

Std  Dev 

Mean 

Std  Dev 

1 

19.6 

2.46 

19.7 

5.01 

20.2 

4.82 

2 

19.5 

3.15 

20.0 

4.60 

20.1 

4.34 

3 

18.4 

3.17 

19.8 

5.02 

19.7 

3.80 

k 

18.8 

2.80 

20.0 

4.86 

20.1 

5.35 

5 

20.1 

2.86 

20.0 

4.93 

19.8 

5.24 

6 

18.9 

2.70 

20.0 

4.85 

19.9 

5.58 

7 

20.5 

2.60 

20.1 

4.95 

19.6 

5.22 

8 

19.3 

2.32 

19.? 

4.84 

19.6 

4.42 

Ave 

19.4 

2.77 

19.8 

4.88 

19.8 

4.88 

Range 

2.1 

0.85 

0.4 

0.42 

0.6 

1.78 

Sample  Size  = 

100 

Sample 

Size  =  100 

Sample  Size  =  100 

No.  of 

Trials  = 

200 

No.  of 

No.  of 

Trials  =  900  Trials  =  200 

Increments  used  equal  one-half  the  true  standard  division.  Standard 
grouped  data  calculations  were  used. 

Increments  and  calculations  used  are  described  in  Reference  3. 


152 


TABLE  V 


Characteristics  of  Various  Modifications  of  the  Up-and-Do\«x  Method 

Sampling  a  Normal  Distribution 

True  Mean  =  20.0 

True  Standard  Deviation  =  5*0 

Percentage 


Modification 

Incre¬ 

ment 

Total  No. 
of  Tricils 

Sample 

Size 

Point 

Measured 

Average 

Standard 

Deviation- 

None 

a/2 

2700 

1350 

50 

19.8 

2.5 

Two  Failure 

0/2 

250 

74 

26 

16.8 

2.5 

Three  Failure 

0/2 

498 

111 

18 

15.5 

2.5 

Ten  Failure 

a/5 

1150 

115 

8 

12.9 

2.5 

Fifteen  Feiilure 

a/5 

1845 

123 

5.5 

11.9 

1.2 

Two  Success 

0/2 

405 

116 

54 

20.5 

2.0 

Three  Success 

0/2 

720 

I8l 

69 

22.5 

2.0 

Ten  Success 

0/5 

1170 

117 

90 

26.5 

1.5 

a 


Actual  standard  deviation  of  the  method,  not  the  modified  mean  square. 


icr 


153 


TABLE  VI 


Effect  of  the  Form  of  the  Distribution  on  the  Characteristics  of  the 
Up-and-Down  and  Run-Down  Methods 

True  Mean  =  20 oO 

True  Btandeird  Deviation  =  5»0 

Increment  =  o/2 


Up-and-Down 


Run-Down 


Positively 

Skewed 


Negatively 

Skewed 


Positively 

Skewed 


Negatively 

Skewed 


X _ 

s 

1 _ 

s 

X _ 

s 

X 

6 

19.5 

2.35 

20.6 

2.83 

19.8 

4.27 

19.9 

6.04 

19.4 

3.13 

20.8 

2.52 

19.7 

4.65 

19.8 

6.13 

LSD  - 


2.83  X  3 


=  0.63 


2.83  X  5 

LSD:  =  -  =  lo4l 

■/LOO 


The  total  number  of  trials  used  in  each  case  is  IOOO0  This  is  equivalent 
to  a  sample  size  of  50O  in  the  up-and-down  method  euid  a  sample  size  of  100 
in  the  run-down  method. 

Least  significant  difference  (Ref  1). 


ID* 


13‘i 


TABLE  VII 


Observed  Frequencies  for  the  Up-and-Down  Method 


Distributions  Sampled 


Normal 

Curve 

Positively 

Skewed 

Negatively 

Skewed 

Stimulus  - 
Levels 

Std  Dev 
Units 

Area^ 

b  e 

Freq  - 

kTea.%  - 

Q 

Ffeq- 

Area^  - 

35.0 

3.00 

99 

32.5 

2.50 

99 

98 

30.0 

2.00 

98 

0 

96 

1 

27.5 

1.50 

93 

27 

92 

10 

99 

49 

25.0 

1.00 

84 

182 

85 

94 

86 

183 

22.5 

0.50 

69 

431 

74 

288 

64 

378 

20.0 

0.00 

50 

478 

57 

381 

43 

284 

17.5 

-0.50 

31 

215 

36 

201 

26 

90 

15.0 

-1.00 

16 

4l 

14 

25 

15 

15 

12.5 

-1.50 

7 

4 

1 

1 

8 

1 

10.0 

-2.00 

2 

1 

4 

7.5 

-2.50 

1 

2 

5.0  -3.00 

13^  iOOT  1000 

The  cell  maxima. 

Taken  from  Table  I. 

-  Observed  success  frequencies  using  the  follov/ing  total  number  of  trials; 
Normal  Curve  -  2750  Positively  Skewed  -  2000  Negatively  Skewed  -  2000 


155 


TABLE  VIII 


Observed  Hrequencies  for  the  Hun-Down  Method 


Distributions  Sampled 


Positively 

Negatively 

Normal  Curve 

Skewed 

Skewed 

Stimulus  - 
Levels 

Std  Dev 
Units 

AreaJ^ 

ETeq  - 

AreaJ^- 

0 

Freq- 

Are^^ 

0 

Freq- 

35.0 

3.00 

99 

32.5 

2.50 

99 

797 

98 

30.0 

2.00 

98 

743 

96 

196 

27.5 

1.50 

93 

673 

92 

187 

99 

197 

25.0 

1.00 

84 

562 

85 

176 

86 

166 

22.5 

0.50 

69 

411 

74 

146 

64 

125 

20.0 

0.00 

50 

254 

57 

118 

43 

87 

.  17.5 

-0.50 

31 

122 

36 

67 

26 

51 

15.0 

-1.00 

l6 

53 

14 

20 

15 

30 

12.5 

-1.50 

7 

19 

1 

0 

8 

17 

10.0 

-2.00 

2 

4 

8 

7.5 

-2.50 

1 

2 

3 

5.0 

-3.00 

1 

0 

9 

-  The  cell  maxima. 

-  Taken  from  Table  I. 

-  Observed  success  frequencies  using  the  follovring  nmber  of  trials  at 
each  stimulus  level;  Normal  Curve  -  800  Positively  Skewed  -  200 
Negatively  Skewed  -  200 


156 


RSFERBNC53 


1.  Dixon  &  Massey,  Introduction  to  Statistical  Analysis,  New  York,  Mcgraw- 
Hill  Book  Co.,  Inc.,  second  edition,  1957* 

2.  Anderson,  T.  W, ,  Staircase  Methods  of  Sensitivity  Testing,  NAVORD 
Report  65-^6,  Statistical  Research  Group,  Pidnceton  University 
March  19^6. 

5.  Churchman,  C.  W.,  Tables  for  Sensitivity  Tests  Conducted  at  Two-Stimuli, 
Statistical  Memo  No.  6,  FIrankford  Arsenal  Memo  Report  MR-5^*  May  1955. 

4.  Churchman,  C.  W.,  Manual  for  Proposed  Acceptance  Test  for  Sensitivity 
of  Percussion  Primers,  Frankford  Arsenal  Report  R-259A,  January  1945. 

5.  Rinkenbach,  W.  H.  and  Clear,  A.  Standard  Laboratory  Procedures  for 
Sensitivity,  Brisance  and  Stability  of  Explosives,  Picatinny  Arsenal 
Technical  Report  l401,  Revision  1,  February  1950. 

6.  OSRD  report  4o40  (NDRC  AMP  Report  No.  101  IR),  Statistical  Analysis  for 
a  New  Procedure  in  Sensitivity  Experiments,  Statistical  Research  Group, 
Princeton  University. 

7.  Bulfinch,  Alonzo,  Improved  Methods  and  Techniques  for  Testing  Impact 
Sensitivity  of  Explosives,  Picatinny  Arsenal  Technical  Report  2282, 

July  1956, 

8.  Hartvigsen  &  Vandeback,  Sensitivity  Tests  for  Fuzes,  NAVORD  Report 
3496,  US  Naval  Ordnance  Test  Station,  China  Lake,  California,  March  1955. 

9.  Pieruschka,  Erich,  Mathematical  Foundation  of  Reliability  Theory, 
Research  &  Development  Division,  Ordnance  Missile  Laboratories,  Redstone 
Arsenal,  January  1958. 

10.  Lusser,  Robert,  Unreliability  of  Electronics — Cause  and  Cure,  Research  & 
Development  Division,  Ordnance  Missile  Laboratories,  Redstone  Arsenal, 
November  1957. 


STATING  AND  TESTING  CRITEEIA  FOR  SMOOTHING  DATA 


Paul  C.  Cox 

White  Sands  Missile  Range 

'*  Maurice  Kendall,  in  his  second  volume  of  the  Advanced  Theory  of  Statistics 
(page  578)  states,  "There  is  voluminous  literature  on  trend  fitting  which 
appears  to  me  out  of  proportion  to  the  importance  of  the  subject."  This  com¬ 
ment  was  undoubtedly  correct  at  the  time  the  volume  was  prepared  Ca.e.  1948), 
and  as  applied  to  data  in  economics  with  which  Kendall  was  primarily  concerned, 
but  since  that  time  the  requirements  for  tracking  weapons,  targets,  and  many 
other  moving  objects,  and  the  recording  of  many  types  of  signals;  the  necessity 
for  reducing  the  error  (or  noise)  in  the  data,  and  the  development  of  many  new 
types  of  equipment  to  secure  and  record  the  data,  has  made  the  techniques  for 
smoothing  data  extremely  important  in  our  scientific,  engineering,  and  defense 
effort.  Furthermore,  the  advances  in  the  development  of  high  speed  digital 
computers  has  made  it  possible  to  use  procedures  which  would  have  been  imprac¬ 
tical  a  few  years  ago.  Thus,  developments  during  the  past  decade  are  requiring 
new  better  techniques  for  data  smoothing..  It  is  the  purpose  of  this  talk 
to  present; 

1,  Desirable  criteria  for  choosing  a  certain  smoothing  technique, 

2.  Areas  where  study  and  research  may  contribute  toward  better 
procedures, 

3«  Statistical  tests  which  may  be  developed  that  might  be  used  to 
test  these  criteria. 


The  conventional  techniques  for  smoothing  data  usually  consist  of  the 


following  steps: 
where 


(a)  From  the  entire  set  of  data  select  the  first  points, 


to 


N,  is  usually  an  odd  number;  (b).  Fit  a  polynoSial  of-  degree  r^ 
these  points;  (c)  Choose  k-  points  from  the  center  of  the  points 
(k-<N-).  At  these  k^^  points  compute  the  polynomial  values,  which  will  be 
accepted  eis  the  smoothed  values  corresponding  to  these  k_  values;  (d)  If 
velocity  and  acceleration  data  is  desired,  the  polynomial  may  be  differ¬ 
entiated  successively;  (e)  Selec-t  a  set  of 


point 


+  N,  -  N, 


N-  points  the 


first  of  which  is 


+  k^')  and  fit  k-  additional  values  from  a  second 


polynomial.  Velocity  and  acceleration  data  will  again  be  obtained  by  dif¬ 
ferentiation.  (N-  and  k  will  probably  be  equal  N^^  and  k^  most  of  the  time, 
but  not  necessarily  all  of  the  time);  (f)  This  process  will  be  continued 
with  N,,  k  ,  and  r,,  N.  ,  k.  ,  and  r.  ,  etc.  until  the  data  has  all  been 
smoothid.  ^ 


The  first  problem  is  related  to  the  selection  of  r.  It  is  desirable 
that  the  degree  of  the  polynomial  be  selected  such  that  the  error  in  the 
data  (the  noise)  may  be  eliminated  as  much  as  possible,  and  yet  it  is  not 
desirable  to  choose  r  so  large  that  the  smoothed  data  is  over  fitted.  In 
such  case  the  smoothed  data  may  be  following  a  noise  pattern  rather  than 
the  desired  signal.  The  use  of  the  F  test  to  determine  an  optimum  value 
for  r  is  well  known,  and  there  is  little  more  that  need  be  said  about  this 
technique  here.  However,  it  is  suggested  that  the  choice  of  r  may  also  be 
influenced  by  a  knowledge  of  the  physical  characteristics  of  the  data. 


158 


Design  of  Experiments 


That  is  to  say,  if  the  equations  of  motion  are  reasonably  well  known  and  from 
this  it  can  be  stated  that  the  data  should  be  following  a  cubic  equation  (as 
an  example) ,  it  then  appears  that  the  use  of  a  cubic  polynomial  to  smooth  the 
data  would  be  more  desirable  than  using  the  E  test  to  determine  the  value  for 
r.  Furthermore,  if  the  F  test  is  not  used  to  determine  r,  it  might  be  avail¬ 
able  to  test  the  desirable  magnitude  of  N,  k,  or  something  else. 

The  second  problem  is  related  to  the  choice  of  N.  As  a  rule,  this  is 
selected  on  an  arbitrary  basis,  but  it  seems  logical  that  definite  criteria 
may  be  established  for  this  selection,  A  few  years  ago,  N  was  rairely  chosen 
larger  than  23  or  30  because  the  labor  would  have  been  prohibitive,  but  today 
with  the  recent  developments  in  hi^  speed  computers  it  is  not  unreasonable 
to  choose  a  value  for  N  as  large  as  one  or  two  hundred  (possibly  even  more). 
There  exists  a  tremendous  latitude  for  the  choice  of  values  for  N  and  it 
appears  some  criterion  may  be  set  up  which  will  indicate  a  most  desirable 
choice  for  N,  and  then  a  suitable  test  should  be  devised  for  testing  this 
desirability.  To  mention  some  concepts  related  to  the  choice  of  N,  one  might 
mention  that  increasing  the  size  of  N  will  definitely  decrease  the  variance 
of  the  deviation  between  the  smoothed  values  and  the  true  values,  providing 
a  good  fit  is  retained  after  taking  a  larger  value  for  N,  This  may  be 
illustrated  by  the  well  known  formula  for  the  linear  case  in  which: 

_  p 

2  1  (t*  -  t)  2  2 

a  =  -  +  ■tr.  /  z - 4t2  :  where  o  is  the  variance  of  the  smoothed  value 

s  n  E(t  -  t)  e’  s 

2 

at  t  =  t'  and  a  is  the  variance  of  deviations  which  exist  between  observa¬ 
tions  and  the  t|ue  regression  curve.  It  can  easily  be  seen  that  if  remains 
constant  that  becomes  smaller  as  N  becomes  larger.  Unfortunately  we  can¬ 
not  necessarily  make  as  small  as  we  please  by  simply  increasing  the  value 
of  N.  If  N  is  to  be  increased,  it  must  be  done  either  by  increasing  the  lengt 
of  the  record  or  decreasing  the  size  of  At.  In  the  first  case,  it  may  be 
found  that  a  polynomial  will  not  fit  as  well  for  a  long  record  as  for  a  short 
one.  In  the  latter  case,  many  practical  difficulties  are  involved  when  At 
gets  below  a  certain  value. 

Three  suggestions  are  given  regarding  the  selection  of  N»  The  first  is 
that  if  computer  programs  have  been  prepsired  for  certain  values  of  N,  then  it 
follows  that  one  of  these  programs  will  be  selected.  The  second  is  that  in 
general,  N  may  be  increased  until  (as  well  as  corresponding  variances  for 
the  smoothed  velocities  and  accelerations)  is  made  as  small  as  we  please,  al¬ 
though  we  will  sooner  or  later  reach  the  point  of  diminishing  returns.  The 
third  suggestion  is  that  some  test  such  as  the  F  test  might  be  developed  to 
indicate  a  most  desirable  value  for  N, 

The  third  problem  is  related  to  the  choice  of  k,  Eeferring  to  formula 
(1)  it  is  clear  that  the  precision  is  greatly  improved  as  k  is  chosen  to  be 
one  and  this  single  value  is  the  central  value.  However,  this  is  not  necess¬ 
arily  the  case  when  using  higher  degree  equations,  and  when  using  velocity  or 
acceleration  data  rather  than  position  data.  Furthermore,  even  in  those  in¬ 
stances  in  which  the  minimum  error  is  obtained  when  the  smoothed  value  is  the 
central  value,  it  is  possible  one  could  choose  k  larger  than  one  without  any 
appreciable  loss  in  precision. 


Design  of  Experiments 


159 


It  then  appears  that  there  is  need  for  study  to  determine  how  many  and 
which  of  the  N  values  should  be  used  in  the  smoothing  process.  Since  start¬ 
ing  work  on  this  paper,  it  has  come  to  my  attention  that  some  work  has  been 
done  on  this  by  a  few  companies  which  are  interested  in  data  reduction  prob¬ 
lems.  In  particulcir,  the  Jet  Propulsion  Laboratory  has  made  some  excellent 
contributions.  It  appears,  hov/ever,  that  nearly  everything  which  has  been 
written  on  the  subject  can  be  found  only  in  internal  company  reports, 

Kie  fourth  problem  is  one  related  to  the  computation  of  velocity  auid 
acceleration.  The  problem  is  simply  this,  what  is  the  most  efficient  method 
for  computing  velocity?  Is  it  best  to  smooth  the  data,  then  compute  the  ve¬ 
locity;  compute  the  velocity  from  the  raw  data  and  then  smooth;  or  smooth 
the  rav/  data,  compute  the  velocity,  and  then  smooth  the  raw  velocity  data? 
Some  work  has  been  done  by  Mr.  Charles  Bodwell,  formerly  of  Holloman  Air 
Force  Base,  and  his  results  are  available  in  ’’Data  Reduction  Report  Nr. 
M.T.H.T.  295i"  White  Sands  Missile  Range, 

The  final  comment  I  wish  to  make  is  that  the  concept  of  fitting  an 
orthoganal  polynomial  of  degree  r  to  a  set  of  data  is  an  implication  that 
the  true  data  can  be  approximated  nicely  by  means  of  a  polynomial  of  low 
degree,  and  all  deviations  from  this  are  simply  noise.  This  suggests  that 
one  should  study  the  possible  types  of  mathematical  filters  which  might 
filter  out  the  noise,  and  it  is  possible  that  something  may  work  better 
than  the  well  established  orthoganal  polynomial.  For  example,  it  may  be 
best  to  fit  an  exponential  function,  a  sinusoidail  function,  or  simply  use 
a  filter  designed  to  eliminate  all  frequencies  above  a  certain  value. 

In  conclusion,  it  appears  to  me  there  is  considerable  opportunity  to 
bring  the  techniques  for  smoothing  data  up  to  the  present  day  needs.  There 
appearo  to  be  considerable  work  in  tliie  field  at  the  present  time,  but  to  my 
knowledge,  most  of  the  work  is  found  to  be  in  internal  company  reports  and 
is  not  readily  available  to  the  general  public.  This  immediately  points  to 
the  need  for  those  agencies  which  have  valuable  procedures  to  make  an  effort 
to  publish  these  in  scientific  journals. 


ESTIMATION  BY  INDIRECT  MEANS  OF  EFFECT  OF  BACTERIA 
ON  AN  UNCHALLENGED  HOST 


jO,! 


Morris  A.  Rhian 

Biological  Warfare  Laboratories 
U,  S,  Army  Chemical  Corps 

The  data  presented  for  this  discussion  shovf  two  things;  (i)  the  effr-cts 
on  three  species  of  animals  of  doses  of  B,  anthracis  spores  and  (2)  varia¬ 
tions  in  response  in  the  presence  or  absence  of  a  virulence  enhancing  factor 
For  our  discussion  these  data  may  be  regarded  as  examples  of  host-causative 
agent  relationships  or  interactions  under  several  conditions.  It  is  rela¬ 
tively  easy  to  obtain  these  host-agent  interaction  data  with  certain  animals 
but  it  is  vitrually  inroossible  to  obtain  comparable  data  for  other  animals. 

If  a  certain  host  may  not  be  challenged  with  a  certain  disease  agent,  then 
one  is  forced  to  seek  an  estimate  of  the  host-agent  interaction  by  indirect 
means.  So  the  question  to  be  considered  in  this  meeting  is:  May  data  from 
agent-host  interactions  of  the  type  presented  be  used  to  predict  the  inter¬ 
action  of  this  agent  and  an  unchallenged  host? 


B.  anthracis  spores,  prepared  in  both  standard  and  experimental  suspens¬ 
ions,  were  given  by  intraperitoneal  injection.  The  effect  on  the  host  was 
measured  primarily  as  the  mean  time  to  death  of  groups  of  8  or  10  animals 
given  relatively  lax^e  doses  of  spores.  A  few  d-stermnations  were  made 

for  comparison.  In  the  calculations  of  mean  time  to  death,  the  reciprocal 
transformation  was  used  with  values  of  infinity  asrigned  for  time  to  death 
of  animals  that  survived  the  10-day  observation  period. 


Changes  in  the  effect  of  the  agent  on  the  host  were  produced  by  adding 
egg  yolk  medium  to  suspensions  of  spores.  In  comparisons  of  suspensions  with 
and  without  egg  yolk,  diluent  was  used  to  equalize  the  concentrate  on  of  spores 
in  the  control  suspension.  The  usual  dose  by  injection  was  l/2  x  10'  spores. 

In  slide  1^  "the  mean  times  to  death  for  groups  of  mice  given  doses  of 
spores  from  2ii,  standard  and  15  experimental  cultures  are  shown.  There  was  no 
difference  between  the  two  types  of  culture,  hut  significantly  decreased  time 
to  death  wan  observed  riien  egg  yolk  was  added  to  the  spore  suspensions. 

CSou^jarisons  of  lDc/_  values  for  mice  when  spores  were  injected  with  and 
without  egg  yolk  mediuiA^are  shown  in  slide  2,  Overall,  the  LDj^  for  mice  wais 
reduced  approximately  38  fold  by  the  egg  yolk.  Also,  vdth  egg^yolk  all  deaths 
occurred  in  1  dayj  without  egg  yolk  deaths  occurred  in  3  to  7  days. 

Mean  times  to  death  for  groups  of  mice,  guinea  pigs,  and  rats,  given 
standard  and  experimental  cultures  are  shown  in  slides  3  and  I4.,  In  these  com¬ 
parisons  of  the  3  species,  there  are  instances  in  which  the  effects  of  the 
treatments  were  alike  in  all  species  and  other  instances  in  which  they  were 
different.  The  mean  times  to  death  for  rats  were  alwajrs  less  when  the  spore 
suspension  contained  egg  yolk,  and  the  effect  of  the  egg  yolk  was  gener^ly 
greater  in  the  rat  than  in  the  other  2  species. 


# 

ppeceo'N® 


*  Slides  can  be  found  at  the  end  of  this  paper, 


162 


Design  of  Experiments 


VH-th  mice  and  guinea  pigs  the  mean  times  to  death  from  spores  plus  egg 
yolk  were  either  the  same  as  or  less  than  the  comparative  times  from  spore 
suspensions.  However,  the  responses  of  these  two  species  were  not  always 
the  same,  and  an  effect  of  the  egg  treatment  was  shown  in  one  species  but 
not  in  the  other  in  ii  of  the  9  examples. 

Questions  have  occurred  to  me  regarding  the  type  of  work  illustrated. 
Do  the  specific  data  presented  show  defects  in  design  or  execution  of  the 
studies?  May  data  of  thi  s  kind  on  the  agent-host  relationships  of  several 
species  of  animals  be  used  to  predict  an  untried  relationship?  If  the 
approach  illustrated  is  not  satisfactory,  is  there  a  way  that  one  may  esti¬ 
mate  by  indirect  means  the  effects  of  bacteria  on  an  unchallenged  host? 


163 


SLIDE  1  :  Mean  Time  to  Death  (MTD)  in  Hours  of  Mice  Given 

B»  anthracis  Spores  a/ 


Spores  + 

Spores  + 

Range  of 

Diluent 

.  .  Egg 

Difference,  : 

5c  of  2li  Std,  Preps. 

10,8 

9.2 

0  to >6,5 

X  of  1^  Exp,  Preps ♦ 

9.9 

7.9 

0  to  7,3 

Overall  5E  of  39 

10.5 

8.7 

0  to  7.3 

^  Culture  was  mixed  with  an  eoual  volume  of  diluent  or  egg  yolk  medium, 
and  1/2  ml  was  injected  intraperitoneally.  Dose  was  approximately 
1/2  X  109  spores. 


SLIDE  2  :  LD^q  of  B.  anthracis  &ores  Without  and  With 

Egg  Yolk  Medium 


LD 

^0 


MICE 

Standard  Fermentor  Culture 
Egg  Medium  -  Shake  Flask 
Normal  Shaice  Flask 
8:1  Ratio  Liquid;  Agar 
Normal  Shake  Flask 

MEAN 

GUINEA  PIG 

8:1  Ratio  Liquid;  Agar 


Diluent 

Egg  Medium 

218 

1 

383 

6,6 

82 

5.1; 

3710 

65 

320 

60 

38U 

10 

8500 

1000 

SLIDE  3  i  Mean  Time  to  Death  in  Jfours  After  Injection  of 
B,  anthracis  Spores  ^  ¥ith  Diluent  or  Egg  Yblk 

Medium 


Mice 

Guinea 

Pig 

Rat 

Culture 

Dil. 

Egg 

Dil, 

Egg 

Dil. 

Egg 

Std.  Perm, 

13.2 

11.0 

31 

22 

1980 

16 

n  ti 

10.6 

10.0 

36 

19 

118 

19 

it>  n 

7.5 

7.0 

32 

16 

66 

10 

II  ti 

10.1 

7.5 

32 

19 

57 

20 

^  Approximate 

Doses:  Mice 
Rats 

1  1 

HH 

O 

X  10  Sporesj  Guinea  Pigs  -  1 
10^  Spores 

9 

X  10 

Spores j 

SLIDE  U  :  Mean  Time  to  Death  in  Hours  After  Injection  of 
B,  anthracis  Spores  a/  With  Diluent  or  Egg  Iblk 

Ifeditun 


Mice 

Guii^ea 

Pig 

Eat 

Dil. 

Sss _ 

Dil. 

Ess__ 

Dil, 

.  ggg . 

Egg  Medium  -  Firm 

7.2 

5.8 

25 

15 

6767 

7 

Nonnal  --  Sh.  FI, 

IU.5 

11.5 

33 

33 

300 

li9 

8:1  Liquid/Agar 

12.5 

11.2 

3U 

22 

U7 

2U 

Freeze  Dried 

9.0 

7.5 

31 

22 

31 

11 

Freeze  Dried 

11.1 

9.2 

30 

17 

92 

10 

9  9  c 

a/  Approximate  Doses:  Mice  -  l/2  x  10  Spores j  Guinea  Pigs  -  1  x  10  Spores j 

Rats  -  1  X  10^  Spores. 


PRECEDING  PAGE  BLANK 


DETEBMIHATrON  OF  PERFORMANCE  CRITERIA  FOR 
QUARTERMASTER  CORPS  FUNCTIONS 

John  K,  Sterrett 

Research  and  Engineering  Division 
Office  of  the  Quartermaster  General 

In  the  Quartermaster  Corps  the  electric  accounting  machines  are  being 
replaced  by  the  high-speed  data  processing  machines  at  a  very  accelerated  rate. 
As  a  result  ...  the  accounting  function  of  the  Quartermaster  Corps  has  bene- 
fitted  tremendously.  However,  high-speed  electronic  data  processing  machines 
are  capable  of  much  more  than  just  counting  or  keeping  lacords. 

It  appears  logical  , , ,  pertinent  ...  and  timely  ...  to  investigate  -Uie 
addition^Lly  useful  application  of  these  machines  in  the  control  and  manage¬ 
ment  categories.  They  are  perfectly  feasible  of  application  in  areas  where¬ 
by  they  can  aid'in  the  planning  and  control  decisions  necessary  in  logistics 
enterprises. 

As  new  and  more  sophisticated  data  processing  machines  come  into  being 
the  Quartermaster  Corps  can  be  expected  to  utilize  them  •»,  however 
it  is  folly  and  wasteful  to  continue  the  present  practice  of  giving  primary 
emphasis  in  these  machines  to  those  processes,  techniques,  and  manipulations 
•which  formerly  were  associated  •with  -the  use  of  electric  accoun-ting  machines. 

The  machines  now  installed  and  those  programmed  for  the  future  are  capable 
of  much  more  than  their  present  contribution  to  the  overall  Quartermaster 
Corps  supply  mission.  Their  real  po'bential  lies  in  the  fields  of  supply  con¬ 
trol  . . ,  logis-tic  ca-tegory  specification  ...  positioning  and  reporting  ... 
and  general  stock  management.  In  fact,  the  major  benefits  yet  to  be  derived 
from  the  present  machines  will  restilt  from  the  potentials  of  automation  now 
feasible  through  -the  proper  utiliza-tion  of  these  new  devices. 

There  is  every  reason  -to  believe  that  advances  in  machine  "technology 
and  cor^ju^ter  logic  being  developed  and  introduced  into  the  machines  of  the 
future  coincidental  with  a  corresponding  integration  of  logistics  and 
supply  operations  ...  "will  bring  about  improvements  far  grea-ter  than  as  yet 
has  been  envisioned.  An  awareness  of  ...  and  alertness  "to  this  situa-tion 
preceded  the  initia-tion  of  the  •^Study  of  Future  Scientific  Quar-bennas-ter 
Corps  Control  of  Inven-tories^  ...  a  phase  of  which  brings  me  here  before 
you  at  this  -time  in  the  Clinical  Sessions. 

The  overall  piirpose  of  the  effort  in  this  study  is  to  devise  and  pursue 
new  and  unique  approaches  to  Quartermas"ber  Corps  supply  management  and  in- 
ventoiy  techniques.  It  is  our  in-bention  that  -the  approach  shall  adequately 
reflect  and  be  suitably  orien-bed  -bo  the  po-bentials  inherent  in  the  very  la-best 
methods  of  systems  analysis  ...  further  ,,,  we  intend  that  it  shall  encompass 
•the  proper  utilization  of  the  most  recent  advances  in  the  more  sophisticated 
types  of  high-speed  automatic  da-ba  processing  machines. 


preceding  page  blank 


166 


Design  of  Ebq)eriments 


Significantly  important  to  the  successful  prosecution  of  this  study  is 
an  intimate  understanding  of  the  processes  necessary  for  the  integrating  of 
the  various  functions  of  the  supply  mission  ,,,  namely  ,,,  requirements 
procurements  * . ,  distribution  « , .  warehousing  . . ,  inventory  control  , . ,  and 
so  forth.  No  longer  can  these  various  phases  remain  independent. 

This  integration  can  only  be  accomplished  through  provision  and  utiliza¬ 
tion  of  technological  and  functional  improvements  far  in  excess  of  anything 
now  in  being  ...  and  most  likely  foreign  to  most  of  those  things  which  some 
people  might  hold  as  the  proper  way  to  get  things  accomplished,  And  these 
people  are  the  ones  who  must  be  convinced  that  this  new  thing  being  thnist 
upon  them  is  really  an  in^jrovement  ...  they  have  to  be  shown  ,,,  and  doing 
so  must  not  interrupt  daily  operations  ...  now  how  do  we  do  it  ?????  That 
is  one  of  our  many  problemo  which  I  hope  j'du  can  shed  some  light  on  • , .  •  • 

It  is  emphasized  here  that  we  feel  that  technological  advances  are  not 
in  themselves  stafficient  . . ,  the  very  best  mechanization  can  fall  far  short 
of  a  desired  goal  if  functional  relationships  also  involved  in  automation 
activities  are  not  fully  understood,  appreciated,  and  made  an  essential  part 
of  the  automation  flow.  Machines  and  fxinctions  must  both  be  integrated  if 
the  best  in  each  is  to  result. 

Carried  to  its  logical  conclusion  in  this  study  . , .  this  could  mean  the 
integration  of  a  large  number  of  the  now  separated  functions  of  the  Quarter¬ 
master  Corps  ...  integrating  of  such  functions  as  requirements,  distribution, 
storage,  issue,  and  disposal  actions  ...  and  all  these  being  served  by  data 
generated  by  a  single  centrally  controlled  data  processing  organization  ... 
properly  manned  • . .  adequately  eqxiipped  . . .  and  functioning  as  an  integrated 
whole  to  the  best  advantage  of  all  concerned. 

Great  strides  have  recently  been  made  in  the  construction  of  mathematical 
models  to  be  used  in  describing  and  studying  involved  business  systems  and 
other  types  of  extremely  complex  management  operations.  All  of  these  seem  to 
involve  a  desirable  detachment  from  biased  conslusions  when  too  close  adherence 
is  held  to  intuitive  and  qualitative  judgements  ...  also  involved  are  gaming 
types  of  procedures  involving  the  high-speed  computers  themselves.  We  are 
prepared  to  provide  such  analysis  techniques  and  high-speed  conqiuter  apulioa- 
tions  to  our  problem  too.  The  trouble  as  we  view  the  problem  is  ...  where 
do  the  standards  of  comparison  come  from  to  determine  whether  or  not  one 
system  of  approach  is  better  than  another  ...  in  fact  is  even  better  than 
the  present  one  in  existence  ...  and  by  how  much  ????? 

This  ...  then  •••  is  the  essence  of  the  problem  I  bring  to  you  today  ••* 

Before  any  cou^jarison  can  be  made  of  replacement  techniques  ...  if  any 
are  forthcoming  ...  an  agreement  must  be  reached  as  to  what  measures  are  to 
be  considered  pertinent  and  what  combinations  of  these  measures  constitute 
"acceptable"  or  "superior"  performance.  No  decision  on  any  given  technique 
or  method  can  be. reached  prior  to  the  identification,  acknowledgement  and 
establishment  of  a  set  of  critieria  on  \daich  judgements  and  evaluations  are 
to  be  based. 


Design  of  Experiments 


.16,9 


Having  set  such  a  criteria  , , ,  a  means  must  be  provided  by  wiiich  various 
appjxiaches  can  be  tried  and  the  results  recorded  for  comparison.  This  is  ^at 
some  might  call  a  ’’controlled  experimentation”  , , ,  conceivably  it  could  in¬ 
volve  congjlex  mathematical  models  and  Ibeir  manipulations  under  simulated 
conditions  to  those  encountered  within  Quartermaster  Corps  operations  and 
functions. 

Most  of  the  presently  en5)loyed  methods  of  approach  of  this  sort  involve 
rather  extensive  utilization  of  high-speed  computing  machines  because  of 
their  easy  adaptability  to  gaming  theory  which  these  machines  permit  • . ,  as 
well  as  their  large  capacity  for  rapid  manipulation  which  they  have  as  in¬ 
ternal  functions. 

We  have,  then,  three  major  problems  for  which  we  are  seeking  assistance; 

1,  The  means  to  obtain  the  substantiating  data  upon  which  the  desired 
set  of  standards  is  to  be  based, 

2,  The  weighting  to  be  assigned  to  each  source  of  this  sxibstantiatlng 
data, 

3,  The  manner  in  which  the  standards  resiiltLng  can  be  applied  to  arrive 
at  appropriate  ratings. 


PROGRAM  FOR  THE  INTERLABORATORY  DETERMINATION  OF 
,  .  COMPRESSION  SET  OF  ELASTOMERS  AT  LOW  TEMPERATURES 

S.  L.  Eisler 
Rock  Island  Arsenal 

The  purpose  of  this  planned  program  is  to  compare  the  reproducibility  of 
low  temperature  measurements  of  the  compression  set  of  vulcanized  elastomers. 

This  test  measures  the  ability  of  elastomers  to  recover,  at  subzero  tempera¬ 
tures,  from  compressive  deformation  applied  at  room  temperature. 

The  Ordnance  Materials  Research  Office  and  the  Elastomers  Unit  of  the  Rock 
Island  Arsenal  Laboratory  have  been  assigned  the  responsibility  for  designing 
the  program,  preparing  the  test  specimens  for  all  participants  and  analyzing 
the  results.  This  assignment  was  made  by  Working  Group  h  of  Technical  Committee 
of  the  International  Organization  for  Standardization. 

The  program,  as  siibnitted  to  the  various  participants  for  review  and  poss¬ 
ible  suggested  changes,  contains  the  following  variables t 

1,  Specimen  Sizes  (2) 

2,  Test  Temperatures  (2) 

3,  Rubber  Compositions  (3) 

U.  Laboratories  (7) 

5.  Replications  (duplicates  to  be  run  on  each  of  2  days) 

We  would  like  to  be  able  to  determine  if  there  is  a  significant  difference 
in  reproducibill ty  between: 

1,  Laboratories 

2,  Specimen  sizes 

3,  Test  temperatures 

and  (It)  vjithin  laboratories  between  the  two  test  days. 

Similar  compression  set  programs  have  been  conducted  by  individual  labora¬ 
tories  involving  a  smaller  number  of  variables.  In  these  cases,  the  results 
have  usually  been  analyzed  by  means  of  a  series  of  "F"  tests.  This,  of  course. 
Involves  testing  each  level  of  the  second  variable  sepaorately  and  often  it  is 
not  possible  to  arrive  at  a  definite  overall  conclusion.  For  exanple,  com¬ 
pression  set  measurements  were  made  in  one  laboratory  on  nine  different  elas¬ 
tomers  using  two  specimen  sizes.  The  resultant  '*F"  tests  showed  a  significant 
difference  between  specimen  sizes  in  the  case  of  three  elastomers  but  not  for 
the  other  six  elastomers. 

The  results  of  another  program  involving  three  laboratories,  three  compounds 
and  two  methods  were  analyzed  in  a  different  manner.  In  this  program,  each 
laboratory  ran  duplicates  on  each  of  four  days  for  each  compound-method  combi¬ 
nation,  The  results  were  analyzed  by  preparing  an  Analysis  of  Variable  Table 
for  each  of  the  six  compound-method  conbinations ,  A  typical  table  is  shown  be¬ 
low: 


preceding  page  blank 


Y^2  Design  of  Experiments 

Source  £  of  Sq»  f» 

17.10 

8.00 

6,85  3.U2'3f 

2.00 


In  this  example,  due  to  the  significant  interaction,  it  was  reported  that 
the  residual  (within  days)  variance  could  not  be  used  as  a  measure  of  experi¬ 
mental  error.  Therefore,  it  was  necessary  to  consider  the  means  of  the  pairs 
of  duplicates  instead  of  the  original  readings.  This  error  variance  was  ob¬ 
tained  from  the  following  equations 

o2  o2 

NS^  +  Sg  - 


error  variance  required 
residual  variance  =  2,00 

pooled  main  effects  and  interaction  mean  squares  =  9,03 

N  -  no,  of  replications  =  2 

The  error  variances  thus  calculated  for  the  two  methods  were  coii9)ared  by 
means  of  an  “F”  test  to  determine  whether  a  significant  difference  existed  be¬ 
tween  the  precision  of  the  two  methods.  This,  however,  involved  three  separate 
'•F''  tests  one  for  each  compound. 

The  question  I  should  like  to  present  to  the  panel  at  this  time  is  what 
is  the  most  efficient  method  for  analyzing  the  results  of  the  proposed  pro¬ 
gram  in  order  to  conqsare  the  reproducibilities  between  laboratories,  specimen 
sizes  and  test  tengjeratures  as  well  as  within  laboratories. 


Bet.  Labs. 
Bet,  Days 
D  X  L  int. 
Within  Days 
Total 


3U,21 

2ii.00 

ia.i2 

2l;,Q2 

123.55 


AN  APPRAISAL  OF  SEQUMTIAL  AIIALYSIS  UNDER  CONDITIONS 
RESTRICTED  BY  THE  REOJIIREIffiNT  FOR  ADVANCED 
SCHEDULING  AND  PROGRAMMING 


il3 


Edgar  ¥.  Larson  and  Walter  D.  Foster 
Biological  Warfare  Laboratories 
Uo  S,  Array  Chemical  Corps 

1,  INTROHJCTION,  The  design  of  experiments  may  be  broadly  defined  as 
the  vehicle  used  to  provide  answers  to  questions  posed  by  its  partner  and 
teammate,  the  subject  matter  field.  More  and  more  widely  the  answers  are 
being  accepted  on  the  underlying  basis  of  probability.  To  narrow  the  scope 
of  this  paper  immediately,  we  have  selected  from  the  many  current  designs 
that  which  is  known  as  Wald's  Sequential  Analysis,  Our  thinking  and  limited 
experience  in  its  use  with  respect  to  testing  devices  designed  to  aerosolize 
bacterial  suspensions  are  reported  here.  There  is  reason  to  believe  that 
the  principle  of  sequential  analysis  may  be  useful  in  increasing  the  effi¬ 
ciency  of  OTir  testing  efforts  which  are  restricted  by  the  requirement  for 
advanced  scheduling  and  programming, 

2.  TECHNICAL  CHARACTERISTICS  OF  THE  PROBLEM.  The  aerosolizing  devices 
are  essentially  mechanical,  ordnance  type,  and  may  use  coinpressed  gas,  elec¬ 
tricity,  burning  propellants,  pyrotechnic  fuels,  high  ejqslosives,  or  combi¬ 
nations  of  these  as  energy  sources  for  the  dissemination  of  bacteria  in  smal  l 
airborne  particles,  starting  from  concentrates  of  the  organisms  either  in 
liquid  suspension  or  as  dry  powder.  The  primary  responsibility  for  develop¬ 
ment  of  a  particular  device  rests  %d.th  a  design  engineer.  Several  devices 
undergo  research  and  development  concurrently.  When  a  device  is  in  the  con¬ 
cept  stage  it  is  possible  and  necessary  to  delineate  the  design  variables 
which  can  conceivably  affect  performance,  disseminating  efficiency,  and  make 
decisions  concerning  the  practical  range  of  test  levels  for  each  variable 
within  which  aerosolization  performance  must  be  measured.  The  object  of  the 
research  and  development  is  to  determine  the  treatment  combination  or  combi¬ 
nations  which  can  be  expected  to  render  airborne  in  small  particles  the 
greatest  number  of  viable  bacteria  from  the  initial  suspension,  hereinafter 
referred  to  as  fill.  Further,  it  is  desired  that  such  treatments  <iie  expected 
to  produce  bacterial  aerosols  which  decay  with  time  after  dissemination  at 

a  minimum  rate.  Hence,  at  least  two  parameters  are  required  to  summarize 
the  results  of  a  single  aerosol  test.  One  of  these  reflects  the  degree  of 
aerosol  stability  with  time,  i.e,,  a  measure  of  the  decay  rate,  while  the 
second  reflects  the  level  of  recovery,  either  the  regression  intercept  or 
mean. 


In  conducting  aerosol  tests,  closed,  aerosol -tight,  testing  chambers 
are  employed.  The  chamber  atmosphere  is  conditioned  with  respect  to  tempera¬ 
ture  and  relative  humidityj  the  disseminator  is  positioned  centrally  within 
the  chamber  and  charged  with  a  measured  amount  of  fill,  the  bacteilal  density 
of  which  has  been  previously  determined;  the  device  is  energized  and  disse¬ 
mination  takes  place.  The  resulting  aerosol,  which  is  allowed  to  age  as 
long  as  an  hour,  is  sampled  periodically  for  concentration.  The  basic  datum 
from  the  trial  is  per  cent  recovery  computed  simply  as  the  percentage  of  the 
numbers  of  airborne  bacteria  to  the  numbers  of  bacteria  contained  in  the  in¬ 
itial  fill  charge.  The  data  are  subsequently  subjected  to  a  logarithmic  trans¬ 
formation  because  the  transformation  tends  to  stabilizevariances  between 
sampling  periods  and  fi'om  treatment  to  treatment  and  (very  conveniently)  be¬ 
cause  the  plot  of  log  per  cent  recovery  versus  cloud  age  tends  to  describe  a 
straight  line. 


Design  of  Experiments 

There  are  several  reasons  for  scheduling  and  programming  in  advance* 
Firstly,  the  needs  of  several  development  urograms  must  be  satisfied.  Sec¬ 
ondly,  for  the  most  part  the  treahnent  combinations  for  test  are  deteniri.ned 
by  the  results  from  prior  experimentation.  VRien  decisions  are  reached  con¬ 
cerning  the  treatments  which  will  be  subjected  to  test,  time  must  be  allotted 
for  preparation  of  drawings,  for  procurement  of  fabrication  materials  and 
for  scheduling  machine  shop.  time.  Further,  the  procurement,  preparation  and 
characterization  of  fill  materials  is  time  consuming  and  must  be  accomplished 
immediately  prior  to  use  because  of  the  instability  of  the  bacteria  in  sus¬ 
pension.  Also,  testing  technology  requirements  change  from  eroeriment  to 
experiment,  laboratory  glassware  and  equipment  requirements  are  changed  and 
there  are  changes  in  bacterial  growth  media  and  suspending  fluid  requirements 
depending  upon  the  experimental  objectives.  Hence,  complex  scheduling  prob¬ 
lems  are  imposed. 

3,  EXPEfgHQJTAL  DESIGN  CONSIDERATIONS.  For  the  most  part,  heretofore, 
we  have  capitalized  on  the  use  of  experimental  designs  with  sample  sizes 
fixed  in  advance.  Because  of  the  nature  of  our  problem  and  the  characte¬ 
ristics  of  the  testing  system,  balanced  factorial  designs,  randomized  in 
either  complete  or  incomplete  blocks,  have  been  employed  to  the  greatest  ex¬ 
tent*  It  is  not  uncommon  that  the  design  engineer  may  have  a  need  to  in¬ 
vestigate  the  effects  of  as  many  as  ten  variables.  If  one  subjects  each 
variable  to  test  at  only  three  levels,  it  is  obvious  that  many  thousands 
of  treatment  combinations  are  made  available.  Of  course,  every  treatment 
combination  is  not  examined,  but  because  the  problems  are  subject  to  inter¬ 
acting,  variables,  high  order  (3^  designs,  for  example;  factorial  experiments 
are  executed.  The  choice  of  variables  for  inclusion  in  a  single  experiment 
is  generally  made  from  engineering  consideatLons;  the  engineer  has  the  option, 
of  course,  to  go  back  and  combine  variables  from  test  to  test  in  additional 
erroeriments ,  Our  nat\iral  experimental  block  is  limited  by  the  nmber  of 
trials  it  is  possible  to  complete  in  a  single  working  day  in  one  aerosol 
chamberyl.e.,  from  six  to  nine  per  day.  The  aerosol  data,  reduced  to  re- 
gressioji  intercepts  and  slopes,  are  commonly  subjected  to  classical  analyses 
of  variance  according  to  the  selected  desigq.  This  procedure  possesses  cer¬ 
tain  shortcomings  which  we  would  like  to  overcome.  Firstly,  the  analysis 
of  variance  computations  are  nximberous  and  often  involved.  Further,  there 
is  a  tendency  to  ignore  type  II  errors,  i.e.,  the  error  of  accepting  the 
niill  hypothesis  when  it  is  false.  Finally,  when  an  inherently  variable  bio¬ 
logical  system  is  involved  whose  variance  is  neither  well  established  nor 
consistent,  there  are  risks  of  either  under-testing  or  over-testing.  Under- 
testing  fails  to  yield  information  permitting  a  decision  while  over-testing 
is  expensive.  Thus,  the  approach  using  fixed  sample  size  is  desirable  from 
some  stanctooints  and  unsatisfactory  in  others, 

1;,  SEQUENTIAL  ANALYSIS,  In  the  interest  of  reaching  decisions  in  short¬ 
er  testing  time,  we  are  exploring  the  possibility  of  using  sequential  designs, 
starting  mth  Wald's  designs.  Briefly,  we  want  here  to  review  -what  these  de¬ 
signs  are  and  what  questions  they  can  answer  for  us.  Primarily,  the  concept 
Involves  testing  a  null  hypothesis  against  a  specific  alternative  hypothesis 
with  respect  to  a  poptLl.a-td.dn  mean  or  variance,  offering  either  one  or  two 
sided  tests.  Knowledge  of  the  population  variance  is  required.  When  the  ra-tes 
of  error,  a  and  p,  are  specified,  -together  td-th  the  altema-tive  hypothesis,  the 


Design  of  Experiments 


175 


design  is  complete.  Analysis  is  achieved  by  coH^Duting  a  simple  statistic 
which  is  either  tabled  or  plotted,  A  decision  is  reached  when  the  statistic 
exceeds  either  of  the  two  bounds,  which  graphically  are  shown  as  the  familiar 
pair  of  parallel  lines.  As  originally  derived,  the  design  was  applicable 
only  to  a  single  mean  or  variance.  By  a  simple  modification  (undoubtedly 
discovered  and  rediscovered  by  countless  users),  two  means  can  be  accommodated 
by  writing  the  hypothesis  with  respect  to  the  difference,  remembeiing,  of 
course,  to  use  the  variance  of  the  difference  in  constructing  the  design. 

This  modification  especially  lends  itself  to  the  conduct  of  paired  trials* 

Naturally  with  respect  to  our  own  requirements,  this  kind  of  analysis 
has  certain  advantages  and  disadvantages.  Among  the  disadvantages  is  this 
restriction  to  only  two  treatments  when  there  are  many  which  need  testing. 
There  is  no  opportunity  to  estimate  interaction,  a  most  important  conside¬ 
ration  in  development  work,  A  third,  and  again  a  most  serious  disadvantage, 
is  the  inability  to  know  precisely  the  teimnation  date  in  the  sequential 
testing.  Finally,  using  as  we  do  a  biological  response  to  evaluate  a  candi¬ 
date  treatment,  we  are  not  always  sure  we  know  the  variance.  Sometimes  we 

can  say  we  know  it  with  confidewej  other  times  not  at  all.  Of  course,  it 

is  possible  ‘in  the  doubtful  cases  to  resort  to  the  sequential  testj  how¬ 
ever,  the  only  way  to  relate  the  scale  of  standard  de'viations  needed  in  the 

"t”  test  to  the  scale  of  measurement  such  that  the  alternative  hypothesis 
would  then  have  meaning  is  through  knowledge  of  the  variance  or  coefficient 
of  variation  -  a  self-contradicting  situation. 

But  on  the  brighter  side,  the  advantages  include  the  most  highly  prized 
desiderat\im,  namely,  reduced  testing  time,  which,  of  course,  means  reduced 
testing  expanse  if  the  difficulties  in  programming  can  be  obviated.  Another 
advantage  comes  in  requiring  the  experimen-ber  to  consider  an  alternative 
hypothesis  tooiether  with  Type  I  and  Type  II  errors  -  a  concept  stLH  rela¬ 
tively  unknown  ou'bside  of  statistical  circles,  especirily  considering  how 
widely  accepted  the  tenn  "significant  difference"  is.  Finally,  nothing 
appeals  to  an  experimenter  more  than  an  analysis  which  is  completed  on  the 
same  day  as  the  last  trial,, 

AN  EXAMPLE  IN  SCREENING,  In  one  of  our  development  projects,  disse¬ 
mination  of  relatively  large  quantities  of  dry  fill  in  an  aeresol  chamber 
was  required.  However,  for  reasons  of  both,  safety  and  technical  feasibility, 
we  cotiLd  not  tolerate  the  large  numbers  of  bacteria  involved  if  undiluted 
fin  were  to  be  employed.  Therefore,  we  programmed  an  experiment  to  search 
for  a  diluent  which  in  aerosol  would  yield  results  percentage -wise  similar 
to  those  expected  in  an  undiluted  material.  Five  treatments  were  included 
in  the  experiment:,  one-to-ten  dilutions  of  the  dry  bacteria  in  MLcrocele, 
Es'bercil,  cornstarch  and  talcum  -  all  commercial  preducts  -  and  a  one-to-ten 
dilution  of  the  dry  bacteria  in  the  same  material  previously  sterilized. 

Pour  trials  in  a  day  were  conpleted  with  the  same  experimental  treatment 
and  tvro  trials  were  completed  with  the  reference,  the  undiluted  material. 

We  conputed  the  mean  for  the  two  reference  trials,  then  developed  four  dif¬ 
ferences,  one  for  each  trial,  from  the  results  with  the  experimental  treat¬ 
ment.  These  differences  were  obtained  for  each  of  four  aerosol  parameters 
including:  the  intercept  and  slope  from  results  with  a  sampler  collecting 
only  small  aerosol  particles  and  the  intercept  and  slope  from  results  with  a 
sampler  collecting  only  large  aerosol  particles.  The  experiment  was  conducted 


176 


Design  of  Experiments 


in  five-day  cycles,  testing  a  different  treatment  each  day  until  on  the  sixth 
day,  treatment  one  came  up  again  less  our  analysis  from  the  first  day  had  in¬ 
dicated  we  could  reach  a  decision.  By  choice  we  specified  that  when  rejection 
was  indicated  by  any  one  parameter  we  would  discontinue  testing.  By  the 
same  token  final  acceptance  required  all  four  parameters  to  be  acceptable. 

For  each  treatment  with  respect  to  the  reference  we  set  up  the  null  hypothesis 
of  zero  difference  against  the  alternative  of  ,1761  which  is  the  equivalent 
of  a  ^0%  difference  in  log  scale.  Type  I  and  II  errors  were  controlled  at 
and  10^,  respectively.  Analysis  during  the  course  of  the  trials  consisted 
of  computing  the  statistic,  ED,  where  D  was  the  difference  between  treatments. 

According  to  this  design,  all  of  the  candidate  diluents  were  rejected. 
Starch  and  Estercil  were  jrejected  after  four  tidalsj talcum  and  IGcrocele  and 
the  sterilized  material  were  all  rejected  after  eight  trials.  Since  the  mini¬ 
mum  number  of  trials  per  day  was  four,  vxe  actually  over-tested  by  two  trials 
for  the  MLcrocele  and  by  three  trials  for  the  talcum.  Executing  the  experi¬ 
ment  as  we  did,  one  candidate  each  day,  the  scheduling  problem  was  simpli¬ 
fied  to  some  extent.  At  the  end  of  tMs  first  cycle  of  five  we  knew  that 
part  but  not  all  of  a  second  round  woxild  be  necessary,  VIhile  we  could  pre¬ 
dict  roughly  the  end  of  testing,  sufficient  uncertainty  remained  to  require 
twelve  trials  for  the  sterilized  diluent  before  testing  was  stopped.  All 
in  all,  though  this  constituted  a  screening  type  of  experiment  and  there¬ 
fore  was  a  depart'ure  from  the  usualy  type  of  study,  it  is  considered  that 
the*  sequential  analysis  approach  served  to  answer  the  experimental  objective 
in  this  case  efficiently.  Further,  the  order  in  which  we  chose  to  subject 
the  treatments  to  test  minimized  the  scheduling  problem, 

6,  SUtIMARY  AMD  CONCLUSIONS,  Summarizing  this  discussion,  we  have  con¬ 
sidered  the  sequential  analysis  approach  with  respect  to  its  possible  short¬ 
comings  and  advantages  when  applied  to  the  problem  if  testing  for  a  research 
and  development  program,  unique  from  the  standpoint  that  engineered  devices 
are  evaluated  by  a  biological  response.  Shortcomings  of  its  use  are:  (a)it 
is  restricted  to  only  two  treatments,  (b)  it  provides  no  opportunity  to  es¬ 
timate  interaction,  (c)  if  further  coEqxlicates  already  complex  work  schedul¬ 
ing,  and  (d)  it  depends  upon  the  hazardous  assumption  in  biological  response 
situations  that  the  population  variance  is  kno^'m.  Advantages  of  the  approach 
are  listed  as  follows:  (a)  it  minimizes  the  amount  of  testing  required, 

(b)  it  avoids  the  problem  of  serious  under-testing  and  over-testing,  (c)  it 
requires  the  experimenter  to  consider  an  alternative  hypothesis  together 
vjith  Type  I  and  Type  II  errors, and  (d)  it  provides  immediate  answers.  Our 
experience  has  been  limited  to  the  use  of  sequential  analysis  in  screening 
type  experiments.  As  applied,  the  approach  appeared  to  answer  "ttie  experi¬ 
mental  objectives  efficiently.  Over-all  it  is  concluded  that  sequential 
analysis  possesses  characteristics  ^vhich  limit  its  value  for  our  purposes. 
However,  under  certain  conditions  this  design  may  constitute  a  desirable  choice 
among  current  methods  and  further  study  of  the  concept  may  produce  informa¬ 
tion  broadening  its  application. 


SIMPLIFIED  PROCEDURES  FOR  ESTI^UlTIKG  PARAlffiTERS  OF  A 
NOR14AL  DISTRIBUTION  FROM  RESTRICTED  SAl'lPLES 


A.  Clifford  Cohen,  Jr. 
The  University  of  Georgia 


1,  INTRODUCTION.  In  life  testing,  analysis  of  inspection  data,  dosage- 
response  studies,  biological  assays,  target  analyses,  and  in  other  related  in¬ 
vestigations,  it  is  fr^uently  necessary  to  estimate  distribution  parameters 
from  restricted  sample®  in  particular  from  tnincated  and  from  censored  samples. 
Tinincated  samples  are  those  from  which  certain  of  the  population  values  are 
entirely  excluded.  Censored  samples  are  those  in  which  sample  specimens  whose 
measurements  fall  in  restricted  intervals  of  the  random  variable  may  be  iden¬ 
tified  and  thus  counted,  but  not  otherwise  observed,  San^jles  of  these  types 
are  further  classified  as  singly  or  doubly  restricted,  depending  on  whether 
sample  observation  is  restricted  in  only  one  or  in  both  tails  of  the  distri¬ 
bution.  Depending  on  which  tail  of  the.  distribution  is  involved,  singly  re¬ 
stricted  samples  are  still  further  classified  as  left  or  right  restricted. 


,  Unfortunately,  calculating  estimates  from  samples  of  these  types  often 
involves  the  soltuion  of  complicated  non-linear  equations ,  a  task  which  is 
likely  to  be  tedious  and  time  consuming  even  when  appropriate  tables  are 
available.  Here,  we  are  concerned  with  reducing  this  conqjutational  labor  to 
a  reasonable  level  for  the  practical  calculation  of  maximum  likelihood  estimates 
of  the  mean  and  variance  of  a  normal  distribution  with  probability  density 
function 


(1)  f(x)  «=  (0  exp  -[^  -/^)V2<^^  -00  <  X  <  00  . 


The  present  paper  represents  a  consolidation  of  results  given  in  (jQ  for  doubly 
truncated  samples  and  in  ^  for  singly  restricted  samples. 


For  singly  restricted  san5)les,  the  required  estimates  are  obtained  by 
adding  simple  easily  computed  corrections  which  involve  only  a  single  auxiliary 
function  of  the  sample  terminus  to  the  sample  mean  and  variance  respectively. 
Calculation  of  estimates  accordingly  involves  interpolation  in  only  one  table, 
V/ith  the  exception  of  estimators  given  by  Gupta  |~83  ,  who  considered  singly 
censored  samples  only,  previous  applicable  maximum  likelihood  estimators  have 
Involved  tyro  or  more  auxiliary  functions  and  therefore  interpolation  in  two 
or  more  separate  tables.  Estimators  derived  by  Ipsen  for  singly  censored 

samples  from  a  normal  distribution  also  involve  only  a  single  auxiliary  esti¬ 
mating  function  and  thus  interpolation  in  only  one  table.  However,  his  esti¬ 
mators,  which  are  based  on  certain  moment  functions  of  the  restricted  distri¬ 
bution,  differ  slightly  from  applicable  maximum  likelihood  estimators.  Further¬ 
more,  his  tabular  intervals  are  too  wide  and  his  entries  contain  too  few  sig¬ 
nificant  digits  for  accurate  interpolation.  Gupta's  maximum  likelihood  esti¬ 
mators  employ  an  auxiliarj'-  function  which  unfortunately  lacks  linearity  even 
over  short,  intervals,  of  his  argument.  Consequently,  his  tabular  intervals 
also  are  in  many  instances  too  wide  for  easy  interpolation.  Auxiliary  func¬ 
tions  employed  here  are  approximately  linear  over  moderately  wide  intervals 
of  the  arguments  for  both  truncated  and  censored  samples,  so  that  accurate 
interpolation  between  table  entries  is  relatively  easy  in  both  cases.  Tables 
and  graphs  of  these  auxiliary  functions  are  appended. 


17S 


Design  of  Experiments 


In  the  case  of  doubly  truncated  samples,  a  chart  is  provided  which  permit 
a  graphic  reading  of  estimates  of  the  standardized  terminals  to  one  or  perhaps 
two  decimals,  and  thus  the  immediate  calculation  of  estimates  of  the  mean  and 
standard  deviation  to  two  or  perhaps  three  significant  digits.  When  greater 
precision  is  required,  iterative  procedures  described  in  m  may  be  employed 
to  improve  initial  approximations  obtained  frcan  the  chart; 

Since  estimators  of  this  paper  were  derived  by  the  method  of  maTf-imntn  like 
lihood,  for  a  given  sample  they  lead  to  identical  estimates  except  for  possibl 
errors  of  calculation  that  might  be  obtained  from  applicable  mayi mtim  likelihoo 
estimators  previously  obtained  by  Fisher  [73  •  Hald  ,  Halperin  [10  ,  Gup 
the  author  [l]  ,  and  possibly  by  others.  The  computing  routine  given  here,  ho 
ever,  is  believed  to  be  much  simpler  and  easier  to  carry  out.  As  with  maximum 
likelihood  estimators  in  general,  those  for  truncated  and  censored  samples  are 
consistent  and  asymptotically  efficient.  They  are  to  be  recommended  when  samp 
sizes  are  at  least  moderately  large.  When  estimates  must  be  based  on  samples 
of  size  10  or  less  -  perhaps  even  on  slightly  larger  samples,  it  might  be  pre¬ 
ferable  to  employ  linear  unbiassed  estimators  based  on  order  statistics  as  giv 
by  Gupta  [S]  in  the  latter  part  of  his  paper  and  by  Sarhan  and  Greenberg  Q-J] 

For  the  benefit  of  readers  who  may  wish  to  delve  further  into  the  subject 
of  restricted  sampling,  a  list  of  some  of  the  pertinent  references  is  appended 

2,  SINGLY  TRUNCATED  SAMPLES,  Let  x^  be  a  known  fixed  value  of  the  randoi 

variable,  x,  which  we  designate  as  a  terminus  or  truncation  point.  Now  consid' 
a  sample  consisting  of  n  observations  (values)  of  this  random  variable,  such 
that  for  each  observation  (i,e.  for  each  sample  value),  either 

(a)  X  >  x^,  in  which  case  truncation  is  on  the  left, 
or 

(b)  X  <  x^,  in  which  case  truncation  is  on  the  right. 

The  number  of  otherwise  possible  sample  values  excluded  from  observation  as 
a  consequence  of  this  restriction  is  not  known. 

Throughout  this  paper,  we  limit  our  consideration  to  a  random  variable 
with  probability  density  function  (l).  Since  this  function  is  symmetrical 
about  ,  truncation  of  f (x)  on  the  right  at  x^  is  equivalent  to  truncation 

of  f(-x)  on  the  left  at  -x^.  Consequently,  it  is  necessary  to  examine  only 

one  of  these  cases  in  detail,  and  for  this  role,  truncation  on  the  left  has 
been  selected. 

Let  F(x)  designate  the  distribution  function  of  x  and  the  probability 
that  a  selected  value  of  this  random  variable  meets  the  requirements  for  in¬ 
clusion  in  a  sample  that  is  singly  truncated  on  the  left  at  x^  is  given  as 

1  -  F(x^)  or  in  standard  units  as  1  -  f ( ^  ) ,  where 


Design  of  Experiments  I79 

(2)  F(  £  )  =  dt,  with£=  (x^  -yLL)/a,  and  <|)  (t)  =  exp  -  t^/2. 

The  likelihood  function  for  a  sample  of  the  type  under  consideration  is 


(3)  P(3c^,  = 

fT-  F(  £T[  exp  (x^ 


Maximum  likelihood  estimating  equations  follow  as 


/O 

X^  “  =  O  ^  , 

(4)  X  =  OZ, 


S^  (x  -yu.f  = 

2  ^ 

Where  x  and  s  are  the  sample 
(x  =  x^/n  and  (x^ 


[I.  gT], 

mean  and  variance  respectively 
-  x)  /n),  and  where 


(5) 


z(  s.)  ■=  'ti(  S )  /  [T-  F  ( 


The  first  equation  of  (4)  follows  from  the  second  equation  of  (2),  The  last 
two  result  from  taking  logarithms  of  (3) «  differentiating  with  respect  to/^ 
and  a  in  ttim,  and  equating  resulting  derivatives  to  zero.  The  required 
estimators,y^,  and  the  axixiliary  estimator  ^  are  to  be  found  as  simul~ 
taneous  solutions  of  (4)  in  terms  of  the  sample  statistics,  x,  x^,  and  s. 

Throughout  this  paper,  the  symbol  ('^  )  serves  to  distinguish  maximum  likeli¬ 
hood  estimators  from  the  parameters  being  estimated. 


On  eliminating  (x  - between  the  last  two  equations  of  (4)»  we  have 

(6)  [T-  Z(Z  -  gT]  or  =  s^  +  a^Z(Z  -  £). 

Eliminating /^between  the  first  two  equations  of  (4)  leads  to 

(7)  X  -  x^  =  a(Z  -  £)  or  o  =  (x  -  x^)  /  (Z  -  § ), 

Combining  (6)  and  (?)  given 

,2  .  *  g/(Z  -Sf]  • 

Now  let 

W  e(  S )  ■  I  f -g  , 


Design  of  Ebqjeriments 


2 

and  the  estimating  equation  for  o  assumes  the  form 


s^  +  6(x  - 


To  derive  a  corresponding  equation  for  estimating  which  does  not  in¬ 
volve  any  auxiliary  function  other  than  6,  we  eliminate  (Z  -  ^  )  between  (6) 
and  (7)  to  obtain  oZ  =  (o^  -  s^)  /  (x  -  x^). 

On  combining  this  result  ^irith  (9),  we  have 

(10)  aZ  =  e(x  -  x^)  . 

When  (lO)  is  substituted  into  the  second  equation  of  (4) «  we  write  the  desired 
estimating  equation  as 

(11)  =  X  -  e(x  -  . 

By  eliminating  a  between  (6)  and  (?) ,  we  obtain  the  more  familiar  result* 

(12)  [I  -  Z(Z  /  (Z-  f  )^  =  /  (x  -  . 

The  system  of  estimating  equations  (4)  may  now  be  replaced  by  the  equiva¬ 
lent  system  cgnsigt^g  of  (9),  (11),  and  (12).  Let  g  designate  the  solution 
of  (12) ,  let  0  =  6( e  ) ,  and  the  desired  estimators  become 


a2  2  \2 

a  =  s  +  e(x  -  x^) 


(13)  /X=  X  -  e(x  -  X  ) 


As  computational  a.^  ds ,  tables ,  and  a  graph  of  0  as  a  func)^ion  not  of  5  , 
but  of  |l-Z(Z-§Q  /  (Z-g  )^  have  been  provided.  Since  g  is  that  value 
of  g  for  which  (T-  Z(Z  -  §  J]  /  (Z  -  %  ^  3^  /  (Si  -  x  T ,  Me  can  thereby 

determine  G  directly  for  gny  given  sagple  as  that  value  of  9  which  corresponds 
to  the  sample  statistic  s  /  (x  -  x  )  ,  Accordingly,  the  necessity  for  de- 

&  °  A 

termining  5  explicitly  prior  to  calculating  Q  is  eliminated,  and  since  0 
is  the  only  auxiliary  function  appearing  in  the  estimators  (13) ,  only  the 
single  table  of  that  function  is  needed  in  contrast  to  the  two  or  more  tables 
necessary  when  employing  estimators  previously  proposed. 

Entries  of  0  in  Table  1  were  computed  from  existing  tables  of  normal  curve 
areas  and  ordinates  at  equal  intervals  of  §  ,  „This ,  of  course ,  resulted  in 
unequal  intervals  of  the  argument  /  (x  -  x  )^  .  Although  equal  intervals  of 


*  cf.  for  example  reference  [jQ 


Design  of  Experiments 


181 


this  argument  might  be  desirable,  a  degree  of  accuracy  adequate  for  most  practi 
cal  applications  can  be  achieved  through  simple  linear  interpolation,  and  the 
table  has  proven  easy  enough  to  use  even  with  unequal  intervals.  In  view  of 
this  fact,  and  since  any  graduation  for  the  purpose  of  equalizing  intervals 
would  either  result  in  a  loss  of  significant  digits  or  require  complete  recom- 
putation  of  the  table,  it  is  offered  in  its  present  form. 

With  x^,  X,  s  ,  and  accordingly  s  /(x  -  x^)  available  from  the  sample 

data,  it  is  necessary  only  that  we  read  0  from  the  table  or  graph  as  required 
and  calculate  ^  and.^  from  (13).  In  many  applications,  0  may  be  read  with 
sufficient  accuracy  from  the  graph  of  Figure  1.  IVhen  more  accurate  values 
are  required,  they  can  be  obtained  by  direct  reading  or  by  linear  interpolation 
from  Table  1.  Only  in  rare  cases  should  it  be  necessary  to  resort  to  more  com¬ 
plicated  non-linear  interpolative  procedures. 

Once  &  .and>>^  have  been  computed,  ^  follows  from  (2)  without  the  need 


of  additional 

tables  as 

(14) 

e 

where 

A 

0  = 

■J 

Although  estimators  (I3)  have  been  derived  for  samples  that  are  singly 
truncated  on  the  left,  they  are  equally  applicable  when  samples  are  singly 
truncated  on  the  right,  as  a  consequence  of  the  symmetry  of  the  normal  proba¬ 
bility  density  function  (1).  In  both  cases  0  0,  and  as  shown  in  the  sketch 

below,  when 

truncation  is  on  the  left. 

(x  -  x^)  ?  0,  and 

<  X,  whereas  when 

truncation  is  on  the  right. 

(x  -  X  )  <0,  and 
o'  * 

>  X, 


TRUNCATION  TRUNCAHON 

OM  THE  LEFT  AT  x  ON  THE  RIGHT  AT  x‘ 


e 


0 


t 


0 


182 


Design  of  Experiments 


On  the  lower  scale,  t  is  the  standardized  normal  deviate,  t  =  (x  -/^/a.  Thus, 
when  X  is  normal  (/^,  o) ,  t  is  normal  (0,1),  and  if  x  =  -x  ,  it  follows  that 

3,  SINGLY  CET'JSORED  SAMPLES,  We  consider  two  t3rpes  of  censored  samples, 

A  Type  1  Censored  Sample  is  one  in  vfhich  the  terminus  or  point  of  censoring  is 
fixed,  while  a  Type  II  Censored  Sample  is  one  in  which  the  number  of  censored 
observations  is  fixed.  Within  each  of  these  categories,  we  may  have  censoring 
either  on  the  right  or  left.  Here  as  in  the  case  of  truncated  samples,  the 
symmetry  of  f(x)  makes  it  unnecessary  to  consider  both  left  and  right  censoring 
in  detail,  and  again  our  derivations  are  confined  to  left  censored  samples. 

Type  1  Singly  Censored  Samples 

In  this  category,  we  consider  samples  consisting  of  a  total  of  N  obser¬ 
vations  subject  to  the  restriction  that  full  measurement  (i,e.  unrestricted 
observation)  of  the  random  variable  x  is  possible  if  and  only  if 

(a)  X  ^x  .  in  which  case  censoring  is  on  the  left, 

o  — — — — 

or 

(b)  X  <  x^,  in  which  case  censoring  is  on  the  right, 

where  is  a  known  fixed  terminus.  Let  n  designate  the  number  of  fully  measured 

observations  and  n^  the  nvimber  of  censored  observations  for  which  it  is  known 

only  that  x  <  x^  (  x  >-x^,  in  the  case  of  censoring  on  the  right).  Since 

and  N  are  fixed,  both  n  and  n,  are  random  variables  subject  to  the  condition 
that  n^^  +  n  =  N,  , 

The  likelihood  fimction  for  a  sample  of  this  type  is 


(15)  p  =  [^/n^l  ^  .  exp  (-s"(x.  -jxf /2o) , 

where  3  and  F(  §  )  are  given  by  (2), 

In  this  case  the  maximum  likelihood  estimating  equations  are 
Xq  -  =  a  E  , 

^2  "  “  o  Y, 

=2  ,  (5  .^2  »  H*  53  . 


(16) 


Design  of  Experiments 
where 


183 


Z(-  ^ ,  with  h  =  n^/w. 

The  first  equation  of  (16)  comes  from  (2)  and  is  identical  with  the  first 
equation  of  (4)  for  the  truncated  case.  The  last  two  equations  of  (16)  result 
from  taking  logarithms  of  (I5),  differentiating  with  respect  toyW^gnd  in 
turn  and  equating  to  zero.  Here  as  in  the  truncated  case,  x  and  are  the 
sample  mean  and  variance  respectively. 

Estimating  equations  (16)  which  apply  in  the  censored  case  differ  from 
equations  (4)  which  apply  in  the  truncated  case  only  in  that  Z(_^  )  appear¬ 
ing  in  (4)  has  been  replaced  in  (16)  by  Y(h,  ^  )  which  is  defined  by  (17). 
Procedures  analogous  to  those  employed  in  the  truncated  case  enable  us  to 
replace  the  system  of  equations  (16)  with  the  equivalent  system, 

a  =  s  +  X(x  -  x^j  , 

(18)  =  X  -  X(5c  -  x^)  , 

[1  -  Y(Y  -  BT|  /(Y  -  g  )^  =  sV(x  -  , 

^vhere 

(19)  X(h,  g  )  =  Y(h,  § )  /  [^h,  g  )  -  §  . 

Let  g  designate  the  solution  of  the  third  equation  of  (18),  let  X  =  A.(h,3) 
and  the  desired  estimators  become 

(20)  «  s^  +  X(x  -  x^)^  , 

=  X  -  X(x  -  X  )  , 

o 

As  computational  aids  in  this  case,  tables  and  graphs  of  X  as  a  function 
of  h  and  jT  -  Y(Y  ~  g  T|  /(Y  -  f,)^  have  been  prepared.  Since  5  is  the  solu¬ 
tion  of  the  third  equation  of  (18),  we  determine  X  directly  for  ^7  san^jle 

as  that  value  of  X  which  corresponds  to  the  sample  statistics  h  and  s^  /  (x  -  x 
As  in  the  ti*uncated  case,  only  one  table  is  required,  ° 

With  h,  x^,  X,  s  and  therefore  s  /(x  -  x^)  available  from  the  sample 

data,  it  is  necessary  only  that  X  be  read  from  Table  2(using  two-way  linear^ 
inte^olation)  or  from  the  graphs  of  Figures  2  or  3  as  required,  and  that  a 
and  >cbe  calculated  using  estimators  of  (20),  In  Figure  2,  X  is  graphed  for 
h  =  0(.0l),27,  while  in  Figure  3,  it  is  graphed  for  h  =  0(,05).75.  In  both 
figures  2  and  3,  the  range  of  sy(x  -  x^)2  is  (0,1.3).  In  Table  2,  X  is 

given  to  4D  for  h  =  .01(.0l).05(.05).50  and  for  s^/(x  -  x^)^  =  0(. 05)1.00, 


1^4 


Design  of  Experiment! 


As  in  the  truncated  case  ^  can,  when  required,  be  obtained  from  (I4), 
Estimators  (20)  are  equally  applicable  to  both  left  and  right  censored  sample! 
for  the  same  reasons  that  estimators  (13)  apply  to  both  left  and  right  trtm- 
cated  samples. 

Type  II  Singly  Censored  Samples 

In  this  category,  we  can  consider  samples  consisting  of  N  observations  oJ 
a  random  variable  with  probability  density  function  (l)  such  that 

(a)  the  smallest  N  -  n  observations  are  counted  but  not  otherwise 
measured  (in  which  case  censoring  is  on  the  left) , 

or 

(b)  the  largest  N  -  n  observations  are  counted  but  not  otherwise 
measured  (in  t'/hich  case  censoring  is  on  the  right). 

Let  X  designate  the  smallest  (or  largest)  coaqjletely  measured  observation, 
and  tlie  sample  thus  consists  of  n  completely  measured  observations  each  of 
xirhich  is  equal  to  or  greater  than  x  (or  equal  to  or  less  than  x  )  plus  N  - 

n  unmeasured  observations  about  wflich  it  is  known  only  that  x  <  x 
/  -v  \  •'  n 

(or  X  >  X  ; . 
n 

Estimators  for  this  case  turn  out  to  be  identical  with  those  of  (20)  for 
Type  I  Singly  Censored  Samples,  when  v/e  let 


(21)  N  -  n  =  n^  . 

Although  there  are  no  essential  differences  between  estimators  for  Type 
I  Singly  Censored  Samples  and  those  for  Type  II  Singly  Censored  Samples, 
variances  of  these  estimators  differ  in  the  two  cases  as  may  be  noted  in 
Section  5. 

4.  DOUBLY  TRUNCATED  SAMPLES.  In  this  section  we  consider  a  sample  con¬ 
sisting  of  n  observations  of  random  variable  x  which  has  probability  density 
function  (l)  such  that  each  observation  is  subject  to  the  restriction  x  <x  <x 
+w,  vrhere  sample  terminals  and  w  are  fixed.  The  logarithm  of  the 

likelihood  function  for  a  sample  of  this  type  is 

L  =  -n  In  |T(  ^2^  “  ^  1^  -n*lna  -^E^(x^  ->^)^/2a^)+  constant, 

where  =  (x^  and  =  (x^  +  w 

As  derived  in  ,  maximum  likelihood  estimating  equations  may  be  re¬ 
duced  to 


185 


Design  of  Experiments 

/(g2-§/-sVw=  . 

where 

[^S2)“F(§ir]  ,i  =  l,  2. 

With  (x  -  x^)/w  and  s  /w^  canputed  for  any  given  sample,  coordinates  of  the 

intersection  of  the  corresponding  pair  of  curves  in  Figure  4  are  the  required 
values  of  ^  ^  and  ^  ^  ,  With  care,  these  values  can  be  read  to  within  three 

to  five  vinits  in  the  second  decimal.  The  desired  estimates  then  follow  as 

a  =  w/(  I  2  -  and/^=  -  o  . 

5.  Sj^LING  ERRORS  OF  ESTIMATES.  The  asympotitic  variance-covariance 
matrix  of  f  8)  is  obtained  by  inverting  the  matrix  whose  elements  are 
negatives  of  expected  values  of  the  second  order  derivatives  of  logarithms 
of  the  likelihood  functions.  Accordingly  we  obtain 

T{>^)  Ajt^A(nJ  [izaA W22  -  111  21  * 

(aa,  v(S, 

Cov(  jj^ ,  'h)r\J  [a^/fe(nj  Pi2A?11^22  "  ^12  J  ’ 

where  E(n)  is  the  expected  values  of  the  number  of  complete]^  measured 
observations,  and  (|)^,  ^^^2  respectively  -  E(5^l/3/x.2j ^ 


-  [a^/fe(r^  E(a^/a>i^ao),  and  -  \?/Einl]  EO^L/da^).  To 

A  i  A 

simplify  the  notation,  L  has  been  written  for  In  P,  and  0).  .  for  Vj  j(  S  )  or 

A  i  J  *  X  J 

5  )  as  applicable. 

In  truncated  samples  and  in  censored  samples  of  Type  II,  n  is  fixed  and 
therefore  E(n)  «=  n.  In  samples  th^  are  Type  I  singly  censored  on  the  left. 
E(n)  “N  [i-F(sO*  A  singly  truncated  and  for  singly  censored 

samples  of  types  I  and  II  are 


Truncated  Samples 


36  I  Censored  Samples 


=1-Z(§)  f^(  5)  -  1  *  Z(  S)  II(-g) 

Oi2(5)-Z(S.)  ^  I5  ^  >  ■|]3’ 

(I).  =2*S|>i2(I). 


Design  of  Ebqjeriments 

(23)  Type  II  Censored  Samples 

I.  )  =  1  +  Y(h,  5  )  [z(-  B  )  +  U  . 

(|)^(h,  I )  =  y(h,  §)  (1+  I  [z(-  §  )  +  . 

It  is  to  be  noted  that  as  N — the  for  type  II  censored  samples  approach 

the  for  type  I  censored  samples.  Likewise  as  N — ^<=0,  v/ith  the  ratio  n/N 

held  fixed,  then  Q  -  F(  ii|-J>n/N.  In  this  sense,  limiting  values  of  varian< 
and  covariance  of  estimates  become  equal  in  the  two  cases. 


The  variance  and  covariance  of  estimators  based  on  samples  that  are  re¬ 
stricted  on  the  left,  may  be  calculated  by  substituting  appropriate  values  of 
5  )  from  (23)  into  (22),  where  ^  =  (x^  f  ®  21s 

applicable.  For  samples  that  are  restricted  on  the  right,  calculations  are 
the  same  except  that  •(-  5  )  from  (23)  are  substituted  into  (22), 

In  the  case  of  doubly  truncated  sanples,  variance  and  covariance  of  esti¬ 
mates  may  be  computed  as  described  in 

The  assistance  of  Mr.  Walt  G,  Herstman,  who  performed  the  calculations 
necessary  for  the  compilation  of  Tables  1  and  2,  and  who  rendered  material 
aid  in  the  preparation  of  the  charts  of  Figures  1,  2,  and  3  is  gratefully 
acknowledged. 

6.  ILDJSTRATIVE  EXAI4PLE3.  To  illustrate  the  practical  application  of 
estimators  derived  in  the  preceding  sections  of  this  paper,  the  follo\o.ng 
examples  have  been  selected. 

Example  1,  Left  truncated.  To  insure  meeting  a  lovrer  specification  limit 
of  0,1215  in.  on  the  thickness  of  a  certain  insulating  washer,  all  production 
of  this  component  is  sorted  through  go,  no-go  gages,  and  all  of  thickness  less 
than  this  value  are  discarded.  For  a  random  sample  of  100  washers  selected 
from  the  screened  (i.g.  the  retained)  production,  it  is  found  that  x  =  O.124624 
and  s^  =  2.1106  X  10“  ,  Since  n  =  100  and  x  =  0,1215,  then  (x  -  x  )  =  0,003124 

and  s  /(x  -  x  )*^  =  0,21627,  5y  linear  interpolation  in  Table  1,  we  obtain 

A  ° 

0  =  0.02012.  Even  without  Table  1,  this  value  might  have  been  read  from  Figure 
1  to  three  decimals  as  O.O3O,  Under  the  assumption  that  x  is  normally  distri¬ 
buted,  we  employ  estimators  (I3)  and  calculate 

=  2.1106  X  10“^  +  0.03012 (.003124)^  =  2.405  x  10“^,  and 

S  =  0.00155, 

jCl=  0.124624  -  0.03012 (.003124)  =  0.1245. 


187 


Design  of  Experiments 

A 

From  (U)  5  =  (0.1215  -  0.1245)/. 00155  =  -1.94. 


In  determining  the  asymptotic  variances  and  covariance  of and  o,  Z(-1.94) 
0.062399  is  calculated  from  the  defining  relation  of  (5)  with  the  aid  of  ordi¬ 
nary  tables  of  normal  curve  areas  and  ordinates.  This  value  might  have  been 
obtained  from  "The  normal  probability  function:  Tables  of  certain  area-ordi¬ 
nate  ratios  and  their  recriprocals" ,  published  as  an  editorial  in  Biometrika. 
Vol.  (42),  (1955),  PP.  217-22.  Using  the  truncated  sample  formulas  of  (23) , 
we  calculate  ^^^(-1.94)  =  0.8751,  (|)^(-1.94)  =  O.3O48,  and  (|)22(-1.94)  =  I.4O87. 

Using  these  values  with  E(n)  =  n  -  100,  and  with  ^  as  calculate  above,  we  ^ 
employ  (22)  to  calculate  2.98  x  10~°,  yCo)<v71,85  x  10”°.  and  Cov(/^.a) 

nJ -0»S5  X  10“^.  It  then  follovrs  that  )/\jl.7  x  10”4,  cr^  ='^(^)/V 

and  p  A  A  =  Cov( JX  ,a)  /  V(a)  Ay  -0.28. 


1.4  X  10’^, 


In  the  case  of  tr\mcated  samples  and  type  I  censored  samples,  these  calcu¬ 
lations  may  be  somewhat  simplified  with  the  aid  of  tables  of  elements  of  the 
variance-covariance  matrices  given  by  *Hald  .  Similar  tables  for  type  I 
censored  samples  v/ere  given  earlier  by  Stevens  p.^  ,  Gupta  (j^I  tabled  corre- 
spondir^  matrix  elements  for  type  II  censored  samples  while  the  author  and  Wood¬ 
ward  (|2J  tables  the  matrix  element  necessary  for  calculating  V(S)  in  the  trun¬ 
cated  case. 


Example  2.  Right  Censored  Type  I.  A  reaction  time  test  is  terminated  at 
the  end  of  ten  hours  in  order  to  eliminate  the  effects  of  certain  contaminants 
which  are  troublesome  when  the  test  is  continued  over  a_longer  period.  For 
specific  sample  of  this  type,  s  =  10,  n  =  62,  n^  =  38,  x  =  8.75,  =  1.1043, 

(x  -  x^)  =  -1.25,  s^/ix  -  x^)^  =  0.70675,  and  h  =  0.38.  Two-way  linear  inter¬ 
polation  in  Table  2  gives  ^  =  0.71.  This  same  value  might  have  been  read  from 
the  graphs  of  Figure  3.  Accordingly,  using  estimators  (20)  we  calculate 


>0  =  8.75  -  0.71(-1.25)  =  9.64, 

=  1.1043  +  0.71(1.5625)  =  2,2137, 

A  A 

a  =  1.49,  and  5  =  0.244. 


Since  this  sample  is  censored  on  the  right,  we  need  the  i  .(-  %  )in 

order  to  determine  the  variances  and  covariance  of and  o.  Accordingly,  we 
calculate  values  of  Z(-0.244)  and  Z(0.2Zj4)  as  defined  by  (5).  From  the  type 
I  censored  formulas  of  (23)  we  evaluate  (^^^(-0.244),  tl2(-0.  24A) ,  and 

^22(-0.244).  With  E(n)  =  100  [l  -  F(-0.244Q  =  100  F(0.244),  and  as 

calculated  above,  we  employ  (22)  to  calculate  V(/x.  )r'_''0.070,  V(a)/V  0.301, 
and  Cov(  j[X,  ‘o)a1/ -0.132, 


^  Raid's  tables  are  also  available  in  his  "Statistical  tables  and  formulas", 
published  by  John  Wiley  and  Sons  (1952). 


188 


Design  of  Experiments 


Example  3.  Right  Censored  Type  II,  A  sample  of  N  =  300  electric  light 
bulbs  were  left  tested  until  n  =  119  has  burned  out  with  the  result  that 

5c  =  1304.832  hrs.,  s^  =  12128.250,  and  x  =  1450.000  hrs.  Accordingly  s  /(x-x  ) 

n  .  n 

=  0.575515,  n,  =  300  -  119  =  181,  and  h  =  181/300  =  0,6033»  Visual  interpolation 

from  the  graph  of  Figure  3,  gives  \  -  1.36,  and  using  estimators  (20)  we  now 
calculate 

JL  =  1304.832  -  1.36(1304.832  -  1450.000)  =  1502  hrs., 

=  12128.250  +  1.36(1304.832  -  1450.000)^  =  40789,  and 

a  =  202  hrs.  From  (I4)  (I450  -  1502)/202  =  -0.257. 

This  example  was  originally  given  by  Gupta  CSl  ,  and  to  the  number  of  singi- 
ficant  digits  given,  the  above  estimates  are  in  agreement  with  those  which  he 
calculated.  A  more  accurate  determination  of  \  and  correspondingly  more  accurate 
determinations  of/Land  o  are  possible  by  calculating  additional  values  of 
I  and  related  functions  directly  from  tables  of  the  normal  curve  areas  and  or¬ 
dinates  or  from  the  Biometrika  editorial  tables  (loc.  cit.)  and  then  interpo¬ 
lating  as  summarized  below. 

s^/(x  -  x^)^  -  ^  K 

0.575304  0.25690  1.35712 

0.575515  0.25693  1.35719 

0.576081  0.25700  1.35735 

A 

With  X  =  1,35719  as  determined^ above,  a  recalculation  using  estimators  (20) 
gives  more  accurate  values  asybo  =  1501.853,  o  -  201.815,  and  of  course 
g  =  -0.25693. 

This  is  a  right  censored  type  II  sample,  and  in  order  to  determine^ variances 
and  covariances  of  the  sample  estimates,  we  must  evaluate  the  (j).  .(h,  )I  that 

is,  (|)j^^(h, 0.257) ,  (|)^2(h, 0.257)  and  ,  where  h  =  O.6O33.  Calculating 

these  values  using  the  type  II  formulas  of  (23)  then  with  E(n)  =  n  =  119,  and 
with  ^  as  determined  above,  we  substitute  into  (23)  and  subsequently  calculate 

0/1.  =  16.6,  Oa  =  ■^(&)rul4.9,  and  p  a -0.57.  The  rather  high 

O'  a  “ 

correlation  between  estimates  reflects  the  high  degree  of  censoring  in  this 

example. 

Example  4.  Doubly  Tmncated.  To  illustrate  estimation  in  the  doubly  trun¬ 
cated  case,  vfe  consider  an  example  in  which  the  entire  production  of  a  certain 
bushing  is  sorted  through  go,  no-go  gauges,  v/ith  the  result  that  items  of  dia¬ 
meter  in  excess  of  O.6OI5  in.  and  those  less  than  0,5985  in.  are  discarded. 

For  a  random  sample  of  75  bushing  selected  from  the  screened  production, 


189 


Design  of  Experiments 

X  =  0.600  U9  133  in.,  =  0.000  000  371  187,  =  0.5985  and  w  =  O.OO3O. 

Thus  X  -  =  0.001  6i,9  31,  (x  -  X|^)/w  =  0.54978,  s  -  0,041  242,  and 

visual  interpolation  between  the  curves  of  Figure  4  gives: 

3  =  -2.52  and52  “  ^.00. 

Accordingly, 

^  ^  —1 

o  =  w/(^  2  -  §1)  “  0.0030/  [£.00  -  (-2.52TJ  =  0.00066.,  and 

A  ^ 

/^=  Xq  “  §  1  "  ^*5985  -  (.00066)  (-2,52)  =  .6OOI6.  Employing  iterative 

procedures  as  described  in  [^4]  ,  the  above  initial  values  may  be  improved 
upon  to  yieldy£L=  0,60017511  and  ^  =  0.00066302, 


Table  1 


Avixiliar7  Estimating  Function  0  for  Singlj  Truncated  Samples 


2//-  \2 

8  /(x  -  X^) 

0 

•Va  - 

0 

0.062  IS 

0.04  335 

0.155  82 

0.008  09 

-.064  05 

.04  413 

.156  86 

.008  32 

.065  69 

,04  490 

.157  90 

.008  56 

.067  39 

.04  626 

.158  95 

.008  81 

.069  16 

.04  768 

.160  01 

.009  06 

0.071  00 

0,04  940 

0.161  07 

0.009  32 

.072  91 

.03  115 

.162  14 

.009  59 

.074  90 

.03  140 

.163  22 

.009  86 

.076  96 

.03  170 

.164  31 

.010  14 

.079  11 

.03  206 

.165  40 

.010  42 

.  0.081  34 

0.03  249 

0.166  50 

0.010  72 

.083  66 

.03  301 

.167  61 

.011  02 

.086  08 

.03  362 

.168  73 

.011  33 

,088  59 

.03  435 

.169  85 

.011  64 

.091  21 

,03  522 

.170  98 

.011  96 

.  0.094  21 

0.03  624 

0.172  12 

0,012  30 

.096  77 

.03  745 

.173  27 

.012  64 

.099  72 

.03  887 

.174  42 

.012  98 

-,.102  79 

.001  05 

.175  58 

.013  34 

.105  98 

.001  25 

.176  75 

.013  71 

0.109  31 

0,001  48 

0.177  92 

0.014  08 

.112  77 

'  .001  74 

.179  11 

.014  46 

.116  37 

,002  05 

.180  30 

.014  86 

.120  11 

.002  41 

.181  50 

.015  26 

.124  00 

.002  83 

.182  71 

.015  67 

0.128  05 

0.003  31 

0.183  93 

0.016  09 

.132  26 

.003  86 

.185  15 

.016  52 

.136  63 

.004  49 

.186  36 

.016  96 

.la  17 

.005  22 

.187  62 

.017  a 

.145  88 

.006  05 

.188  87 

.017  87 

0.150  76 

0.007  01 

0.190  12 

0.018  35 

.151  76 

.007  21 

.191  38 

.018  83 

.152  76 

.007  42 

.192  65 

.019  33 

.153  78 

.007  64 

.193  93 

.019  83 

.154  80 

..007  86 

.195  21 

.020  35 

PRECEDING  PAGE  BLANK 


MNvna  3ovd  ONiaaoadci 


193 


sV(x  ”  ® 


'  •  o 

0.196  51 

0.020  88 

.197  61 

.021  42 

.199  12 

.021  98 

.200  43 

.022  54 

.201  75 

.023  12 

0.203  09 

0.023  72 

.204  43 

.024  32 

.205  77 

.024  94 

.207  13 

.025  57 

.20e  49 

,026  22 

0.209  86 

0.026  88 

,2U  24 

.027  55 

.212  62 

.028  25 

,214  01 

.028  95 

.215  41 

.029  67 

0.216  S2 

0.030  a 

.218  24 

.031  16 

. .219  66 

.031  93 

.221  02 

.032  72 

.222  53 

.033  52 

0.223  98 

0.034  33 

.225  43 

.035  17 

■  .226  89 

.036  02 

.228  36 

.036  89 

.229  84 

■  .037  78 

0.231  32 

0.038  69 

.232  81 

.03962 

.234  31 

•  040  56 

.235  82 

•oa  53 

.237  33 

•  okz  51 

0.238  65 

0.043  52 

. .240  38 

.044  54 

.241  91 

.045  59 

.243  45 

.046  65 

.245  00 

.047  74 

0.246  56 

.0.048  85 

.248  12 

.049  98 

.249  69 

.051  14 

.251  27 

.052  31 

.252  85 

.053  51 

.Va  - 

e 

0.254  44 

0.054  73 

.256  04 

.055  98 

.257  65 

.057  25 

.259  26 

.058  55 

.260  88. 

.059  87 

0.262  50 

0.061  21 

.264  14 

.062  58 

.265  78 

.063  98 

.267  42 

.065  a 

.269  07 

,066  86 

0.270  73 

0,068  33 

.272  40 

.  .069  84 

.274  08 

.071  37 

.275  74 

.072  93 

.277  43 

.074  52 

0.279  12 

0.076  14 

.280  82 

1  .077  79 

.282  52 

.079  47 

.284  23 

.081  18 

.285  94 

.082  92 

0.287  66 

O.O84  69 

.289  39 

.086  49 

.291  12 

.088  33 

,292  86 

.090  20 

.294  60 

.092  10 

0.296  35 

0.094  03 

.298  11 

.096  00 

.299  70 

.097  99 

.301  63 

.100  0 

.303  40 

.102  1 

0.305  18 

0.104  2 

.306  98 

.106  4 

.308  75 

.108  5 

.310  54 

.110  8 

.312  34 

.113  0 

0.314  14 

0.115  3 

.315  95 

.117  6 

.317  76 

..120  0 

.319  57 

,122  4 

.321  40 

.124  9 

PRECEDING  PAGE  BLANK 


195 


2//-  \2 
s  /(x  -  x^) 

e 

sV(x  - 

e 

0.323  23 

0.127  4 

0.399  21 

0.266  0 

.325  06 

.129  9 

.401  16 

.270  5 

.326  90 

.132  5 

.403  11 

.275  2 

.328  73 

.135  1 

.405  07 

.279  9 

.330  57 

.137  7 

.407  02 

.284  7 

0.332  43 

0.140  4 

O.4O8  97 

0.289  6 

.334  28 

.143  2 

.410  90 

.294  5 

.336  13 

.146  0 

.a2  88 

.299  5 

.338  00 

.148  8 

.414  83 

.304  5 

.339  86 

.151  7 

.416  80 

.309  6 

0.341  73 

0.154  6 

o.as  76 

0.314  8 

.343  61 

.157  6 

.420  72 

.320  1 

.345  48 

.160  6 

.422  67 

.325  4 

.347  36 

.163  6 

.424  63 

.330  8 

.349  25 

.166  7 

.426  59 

.336  2 

0.351  13 

0.169  9 

0.428  53 

0.341  7 

.353  02 

.173  1 

.430  51 

.347  3 

.354  92 

.176  4 

.432  47 

.353  0 

.356  82 

.179  7 

.434  431^ 

.358  8 

.358  72 

.183  0 

.436  39 

,364  6 

0.360  62 

0.186  4 

0.438  35 

0.370  5 

.362  53 

.189  9 

.440  31 

.376  4 

.364  43 

.193  4 

.442  27 

.382  5 

.366  35 

.196  9 

.444  23 

.388  6 

.368  26 

.200  6 

.446  19 

.394  8 

0.370  18 

0,204  2 

0.448  15 

0.401  0 

.372  10 

.207  9 

.450  10 

.407  4 

.374  02 

.211  7 

.452  06 

.a3  8 

.375  95 

.215  5 

.454  02 

,kZ0  3 

.377  88 

.219  4 

.455  97 

.426  9 

0.379  81 

0.223  4 

0.457  92 

0.433  5 

.381  74 

.227  4 

.459  88 

.440  2 

.383  67 

.231  4 

.461  83 

.447  1 

.385  61 

.235  5 

.463  78 

.454  0 

.387  55 

i239  7 

.465  73 

.460  9 

0.389  47 

0.243  9 

0.467  67 

0.468  0 

.391  43 

.248  2 

.469  62 

.475  1 

.393  37 

.252  6 

.471  57 

.482  4 

.395  32 

.256  9 

.473  51 

.489  7 

.397  27 

.261  4 

.475  45 

.497  1 

A  more  extensive  table  listing  larger  entries  of  s^/(x  -  is  available 

in  reference  . 


PRECEDING  PAGE  BLANK 


197 


2  //-  ^2 
s  /(x  -  x^) 

e 

2 /f-  y2 

»  /(x  -  x^) 

9 

0,1,77  39 

0.504  5 

0.552  83 

0.880  3 

.479  32 

.512  1 

.554  65 

.891  8 

.481  26 

,519  5 

.556  46 

.903  3 

.483  20 

.527  5 

.558  27 

.915  0 

.485  13 

.535  3 

.560  07 

.926  8 

0,487  06 

0.543  2 

0.561  84 

0.938  8 

.488  99 

.551  2 

.563  66 

.950  8 

.490  91 

^  .559  3 

.565  46 

.962  9 

.492  84 

^.567  5 

.567  24 

.975  2 

.494  76 

.575  8 

.569  02 

.987  5 

0.496  68 

0.584  1 

0.570  80 

1.000  0 

.498  63 

.592  6 

.572  63 

1.012  6 

.500  51 

.601  2 

.574  34 

1.025  3 

.502  42 

.609  8 

.576  10 

1.038  1 

.504  33 

.618  5 

.577  86 

1.051  1 

0.506  28 

0.627  3 

0.579  65 

1.064  1 

.508  14 

.636  3 

.581  36 

1.077  3 

.510  04 

.645  3 

.583  11 

1.090  6 

.511  93 

•  654  4 

.584  85 

1.104  0 

.513  85 

.663  6 

.586  58 

1.117  5 

0.515  72 

0.672  9 

0.588  31 

1.131  1 

.517  61 

.682  3 

.590  04 

1.144  9 

.519  49 

.691  8 

.591  76 

1.158  8 

.521  38 

.701  4 

;593  47 

1.172  8 

.523  25 

.711  1 

.595  18 

1.186  9 

0.525  13 

0.720  9 

0.596  89 

1.201  1 

.526  97 

.730  8 

.598  59 

1.215  5 

.528  87 

.740  8 

.600  28 

1.230  0 

.530  74 

.750  9 

.601  97 

1.244  6 

.532  60 

.761  1 

,603  66 

1.259  3 

0.534  46 

0.771  4 

0.605  34 

1.274  2 

.536  31 

.781  9 

.607  01 

1.289  2 

.538  16 

.792  4 

.608  68 

1.304  3 

.540  01 

.803  0 

f6l0  35 

1.319  5 

.5a  85 

.813  7 

.612  01 

1.334  9 

0.543  69 

0.824  5 

0.613  66 

1.350  4 

.545  53 

.835  5 

.615  31 

1.366  0 

.547  36 

.846  5 

.616  96 

1.381  7 

.549  19 

.857  7 

.618  59 

1.397  6 

.551  01 

.868  9 

.620  23 

1.413  6 

PRECEDlNn  r. 


199 


sV(x  - 

0 

sV(x  - 

e 

0.621  86 

1.429  7 

0.777  12 

4.42 

.623  48 

1.446  0 

.781  95 

4.59 

.625  09 

1.462  3 

.786  66 

4.77 

.626  71 

1.478  8 

.791  26 

4.96 

.62g  31 

1.495  5 

.795  74 

5.14 

0.629  91 

1.512  3 

0.800  12 

5.33 

.631  51 

1.529  2 

.804  39 

5.52 

.633  10 

1.546  2 

.808  55 

5.73 

.634  68 

1.563  4 

.812  62 

5.94 

.636  26 

1.580  7 

.816  58 

6.14 

0.63?  84 

1.598  1 

0.820  44 

6.36 

.639  40 

1.615  7 

.824  21 

6.58 

.640  97 

1.633  4 

.827  88 

6.80 

.642  52 

1.651  2 

.831  47 

7.03 

.644  08 

1.669  2 

.834  96 

7.26 

0.645  62 

1.687  3 

0.838  37 

7.50 

.647  35 

1.705  7 

.8a  69 

7.74 

.648  70 

1.724  0 

.844  93 

7.98 

.650  23 

1,742  5 

.848  09 

8.23 

.651  75 

1.761  1 

.851  17 

8.49 

0,653  27 

1.779  9 

0.854  17 

8.75 

,660  76 

1.88 

.857  10 

9.01 

.668  14 

1.98 

.859  95 

9.28 

.675  36 

2.08 

.862  74 

9.55 

.682  44 

2.19 

.865  45 

9.83 

0.689  38 

2.30 

0.868 

10.11 

.696  18 

2. a 

.871 

10.40 

.702  84 

2.53 

.873 

10.69 

.709  36 

2.65 

.876 

10.99 

.715  74 

2.77 

.878 

11.29 

0.721  98 

2.90 

0.880 

11.60 

.728  08 

3.04 

.734  05 

3.17 

.739  88 

3.32 

.745  58 

3.46 

0.751  16 

3.61 

.756  60 

3.76 

.761  91 

3.92 

.767  10 

4.08 

.772  17 

4.25 

preceding  page  blank 


Table  2 


Auxiliary  Estima-ting  Function  X*  for  Singly  Censored  Samples 


.02 

.03 

.04 

.05 

.10 

.15 

.00 

.0101 

.0204 

.0309 

.oa6 

.05245 

.1102 

.1734 

.05 

.01055 

.02129 

.03222 

.04334 

.05467 

.1143 

.1793 

.10 

.01095 

.02208 

.03340 

.04490 

.05659 

.1180 

.1848 

.15 

.01131 

.02280 

.03446 

.04632 

.05836 

.1215 

.1898 

.20 

.01164 

.02346 

.03545 

.04763 

.05999 

.1247 

.1946 

.25 

.01195 

.02408 

.03638 

.04886 

.06152 

.1277 

.1991 

.30 

.01224 

.02466 

.03725 

.05002 

.06297 

.1306 

.2034 

.35 

.01252 

.02521 

.03808 

.05112 

.06434 

.1333 

.2075 

.AO 

.01278 

.02574 

.03887 

.05217 

.06566 

.1360 

.2114 

.45 

.01304 

.02624 

.03962 

.05318 

.06692 

.1385 

.2152 

.50 

.01328 

.02673 

.04035 

.05415 

.06813 

.1409 

.2188 

.55 

.01351 

.02720 

.oao5 

.05509 

.06930 

.1432 

.2223 

.60 

.01374 

.02765 

.oa73 

.05600 

.07044 

.1455 

.2558 

.65 

.01396 

.02809 

.04239 

.05687 

.07154 

.1477 

.2291 

.70 

.01417 

.02851 

.04303 

.05773 

.07260 

.1499 

.2323 

.75 

.01438 

.02893 

.04365 

.05855 

.07364 

.1520 

.2355 

.30 

.01458 

.02933 

.04426 

.05936 

.07465 

.1540 

.2386 

.85 

.01478 

.02972 

.04485 

.06015 

.07564 

.1560 

.2416 

.90 

.01497 

.03011 

.04542 

.06092 

.07660 

.1580 

.2445 

.95 

.01515 

.03048 

.04599 

.06167 

.07755 

.1599 

.2474 

1.00 

.01534 

.03085 

.04654 

.06241 

.07847 

.1617 

.2502 

In  type  II  censored  samples  is  replaced  by  x^. 


205 


.20 

.25 

.30 

.35 

.40 

.45 

.50 

.00 

.2427 

.3185 

.4021 

.49a 

.5961 

.7096 

.8368 

.05 

.2503 

.3279 

.a3o 

.5066 

.6101 

.7251 

.8539 

.10 

.2574 

.3366 

.4233 

.5184 

.6234 

.7400 

.8703 

.15 

.2640 

.3448 

.4329 

.5296 

.6361 

.7542 

.8860 

.20 

.2703 

.3525 

.4422 

.5403 

.6483 

.7678 

.9012 

.25 

.2763 

.3599 

.4510 

.5506 

.6600 

.7810 

.9158 

.30 

.2819 

.3670 

.4595 

.5604 

.6712 

.7937 

.9299 

.35 

.2874 

.3738 

.4676 

.5699 

.6821 

.8060 

.9437 

.40 

.2926 

.3803 

.4755 

.5791 

.6927 

.8179 

.9570 

.45 

.2976 

.3866 

.4831 

.5880 

.7029 

.8295 

.9700 

.50 

.3025 

.3928 

.4904 

.5967 

.7129 

.8407 

.9826 

.55 

.3073 

.3987 

.4976 

.6051 

.7225 

.8517 

.9949 

.60 

.3118 

.4045 

.5046 

.6133 

.7320 

.8625 

1.0070 

.65 

.3163 

.4101 

.5114 

.6213 

.7412 

.8730 

1.0188 

.70 

.3206 

.4156 

.5180 

.6291 

.7502 

.8832 

1.0303 

.75 

.3249 

.4209 

.5244 

.6367 

.7590 

.8932 

i.oa6 

.80 

.3290 

.4261 

.5308 

.6441 

.7676 

.9031 

1.0527 

.35 

.3331 

.4332 

.5370 

.6514 

.7761 

.9127 

1.0636 

.90 

.3370 

.4362 

.5430 

.6586 

.7844 

.9222 

1.0742 

.95 

.3409 

.4411 

.5490 

.6656 

.7925 

.9314 

1.0847 

1.00 

.3447 

.4459 

.5548 

.6725 

.8005 

.9406 

1.0951 

TTn 


207 


209 


Figure  3  Estimation  Curves  For  Singly  Censored  Samples 


211 


Figure  k 
IHSTlUJGTK^rJS 

li  Locate  (x  -  x  )/w  curve  corresponding  to  sij.'iple  vcdue  of  this 

qurmtitj'.  In^oip  ol.-.tcd  »f  necescary.  2  2 

2.  Pollovr  curve  located  in  (l)  to  noint  v.’herc  it  intersecto  with  s  /\f 
curve  for  corrcopc nding  s;>u.ple  value.  If  necessary,  interpolate  here 
also, 

3.  Cor'dinatcs  of  intersection  deterri'.inod  in  (2)  ,  which  may  be  read  on 
scales  along  the  base  and  the  left  edge  of  chart,  are  the  required 

value  of  and  £2* 


Page  blank 


213 


REFERENCES 

1.  COHEN,  A.C.,  JR.,  "Estimating  the  mean  and  variance  of  normal  populations 
from  singly  tnmcated  and  double  truncated  samples",  Ann.  Math.  Statist., 

Vol.  21  (1930),  pp.  557-69. 

2.  COHEN,  A.C.,  JR.,  and  WOODWARD,  JOHN,  "Tables  of  Pearson-Lee-Fisher  functions 
of  singly  truncated  normal  distributions".  Biometrics,  Vol,  9  (1953) » 

pp.  489-97. 

3.  COHEN,  A.C.,  JR.,  'Restriction  and  selection  in  samples  from  bivariate 
normal  distributions",  J.  Amer.  Stat«  Assn.,  Vol.  50  (1955)*  PP.884-93. 

4.  COHEN,  A.C.,  JR.,  "On  the  solution  of  estimating  equations  for  truncated 
and  censored  samples  from  normal  populations",  Biometrika,  Vol.  44, 

(1957),  pp.  225-36. 

5.  COHEfl,  A.C.,  JR.,  "Restriction  and  selection  in  multinormal  distributions", 
Ann.  Hath.  Statist.,  Vol.  28  (1957) »  pp.  731-41. 

6.  COHEN,  A.C.,  JR.,  "Simplified  Estimators  for  the  normal  distribution  when 
samples  are  singly  censored  or  truncated".  Technical  Report  No.  14,  Con¬ 
tract  No.  DA-01-009-ORD-463,  Dept,  of  Math.,  University  of  Georgia,  (1958). 
This  paper  is  to  appear  in  Issue  No* 3  of  Technometx'ics. 

7.  FISHER,  R.A. ,  "Properties  of  functions".  Math.  Tables,  Vol.  1,  British 
Assn,  for  Advancement  of  Sciences  (1931) »  pp.  xxvi-xxxv. 

8.  GUPTA,  A.K. ,  "Estimation  of  the  mean  and  standard  deviation  of  a  normal 
population  from  a  censored  sample",  Biometrika,  Vol.  39»(1952)  pp.  260-73. 

9.  nALD,  A.,  "Maximum  likelihood  estimation  of  the  parameters  of  a  normal 
distribution  which  is  truncated  at  a  knovm  point",  Skandinavisk  Aktuarieti- 
dskrift,  Vol.  32  (1949),  PP.  119-34. 

10.  HALPERIN,  M,,  "Estimation  in  the  truncated  normal  distribution",  J,  /uner. 
Statist.  Assn.  Vol.  47,  (1952)  pp.  457-65. 

11.  IPSEN,  J. "A  practical  method  of  estimating  the  mean  and  standard 
deviation  of  truncated  normal  distributions",  Human  Biology,  Vol,  21, 

(1949)  pp.  1-16. 

12.  PEARSON,  KAPJj  and  LEE,  ALICE,  "On  the  generalized  probable  error  in  multiple 
normal  correlation",  Biometrika,  Vol.  6  (1908)  pp,  59-68. 

15.  SAPJIAN,  A.E.  and  GREENBERG, B. G. ,  "Estimation  of  location  and  scale  parameters 
by  order  statistics  from  singly  and  doubly  censored  samples’’,  Ann.  Math. 
Statist., Vol.  27  (1956)  pp.  427-51. 

14.  STAMPFORD,  H.R. ,  "The  estimation  of  response-time  distributions.  Part  II", 
Biometrics.  Vol,  9  (1952)  pp.  307-69. 

15.  STEVEI'IS,  W.L.,  "The  truncated  normal  distribution",  (Appendix  to  paper  by 
C.  I.  Bliss  on:  Tlie  calculation  of  the  time  mortality  curve.)  Ann.  Appl. 
Biol.,  Vol.  24  (1937)  pp.  815-52. 

preceding  page  blank 


STATISTICAL  PROBLEMS  ASSOCIATED 
WITH  mSSILB  TESTING 


Dr.  Charles  L.  GarrolJL,  Jr. 

RCA  Ser\dce  Company 

Missile  Test  Project.  Patrick  Air  Force  Base 

There  are  a  large  number  of  statistical  problems  that  arise  in 
connection  with  the  testing  of  missiles  at  the  Atlantic  Missile  Range, 

These  problems  are  somewhat  different  from  the  usual  statistical  problems 
and  do  not  appear  to  have  captured  the  fancy  of  statisticians  who  are  not 
actively  working  at  a  missile  range.  In  spite  of  this,  there  are  some 
significant  problems  and  areas  for  future  statistical  research  that  are 
suggested  by  missile  testing.  If  this  country’s  leading  statisticians  are 
made  aware  of  these  problems  and  encouraged  to  pursue  them,  it  is  believed 
that  contribution  to  the  national  defense  effort  plus  stimulation  to  the 
individxial  statisticaan  will  be  acconqjlished.  Good  work  on  these  and 
related  problems  is  being  carried  on  at  the  various  missile  test  ranges 
but  in  some  cases,  due  to  the  pressure  of  getting  a  job  done  on  time,  it 
is  not  possible  to  fomxlate  and  solve  the  problems  in  their  most  general 
and  abstract  form.  It  is  felt  that  more  work  of  this  type  is  needed  and 
it  is  the  purpose  of  this  paper  to  bring  to  your  attention  a  few  of  these 
statistical  problems. 

Before  going  into  details  concerning  these  problems,  it  would  seem 
proper  to  say  something  about  the  nature  of  the  work  that  is  done  at  the 
Atlantic  Missile  Range  and  by  RCA  Service  Company.  In  general,  research 
and  development  type  missile  programs  are  the  type  that  use  the  range. 

In  this  connection  information  and  data  useful  in  developing  and  evalua¬ 
ting  guidance  systems,  propulsion  systems,  aerodynamic  characteristics 
and  the  weapon  system  itself  are  required,  RCA  is  under  contract  to  assist 
in  developing  an  adequate  range  from  the  instrumentation  point  of  view, 
operate  the  range  instrumentation  on  all  tests  and  reduce  the  data  obtained 
from  the  instrumentation  for  all  missile  contractors  using  the  Atlantic 
Missile  Range, 

In  order  to  cover  an  integrated  part  of  the  statistical  problems  in 
the  allotted  time,  attention  is  to  be  focused  on  those  problems  arising 
from  trajectory  measuiing  systems.  These  are  systems  giving  meastirements 
which  can  be  used  to  reconstruct  the  trajectory  or  the  path  of  the  missile. 
It  might  be  said  in  passing  that  the  accuracy  requirements  for  some  of 
these  systems  are  extremely  rigid,  for  example,  errors  of  less  than  one 
part  in  1,000,000  where  the  measxirements  are  made  from  a  point  500  miles 
from  the  place  where  the  event  is  happening. 

There  are  at  present,  two  basic  types  of  trajectory  measuring  systems; 
(l)  optical  and  (2)  electronic.  In  the  optical  sjrstems,  there  are; 

(a)  Cine-Theodolite 

This  is  an  optical  instrument  installed  in  astrodome  towers. 

The  Askania  Kth  ^3  is  the  standard  theodolite  in  use  and  requires 

two  operators,  one  to  track  in  azimuth  and  one  in  elevation. 

Angular  information  from  precision  glass  dials  are  photographed 


Wg 


RAGP 


216 


Design  of  Experiments 


on  each  frame  together  -with  the  missile  image.  Dial  photography  is 
bv  means  of  strobic  lamps  with  all  theodolites  synchronized  to  "read 
out"  at  the  same  time.  Position  data  is  obtained  using  least  square 
techniques  from  data  from  two  or  more  instruments, 

(b)  Fixed  Metric  Camera 

The  standard  fixed  metric  camera  system  at  AMR  consists  of  CZR 
and  RC-5  cameras  mounted  on  three-axes  gimbal  mounts  capable  of  being 
oriented  to  cover  the  desired  field  of  view.  Each  individual  camera 
gives  the  direction  of  a  ray  in  space  from  the  camera  to  the  missile. 
Least  square  methods  lead  to  position  data.  In  most  cases,  the 
cameras  are  controlled  remotely  from  the  firing  sequencer, 

(c)  Ballistic  Camera 

The  ballistic  camera  systan  at  Atlantic  Missile  Range  includes 
BC-ii  cameras  and  K-37  cameras.  These  cameras  photograph  flashing 
lights  or  flares  at  night.  The  positions  of  the  stars  are  used  to 
orient  the  camerss.  Because  of  the  high  inherent  accuracy  and  relia¬ 
bility  of  the  system,  the  ballistic  cameras  are  used  for  evaluation 
and  in-flight  calibration  of  electronic  trajectory  systems.  At  the 
present  time,  the  BC-i;  system  is  the  most  accurate  instrumentation 
on  the  range. 

In  addition  there  are  other  optical  systems:  engineering  sequential 
optical  systems,  intermediate  focal  length  tracking  telescopes,  large 
tracking  telescopes  (IGOR),  RDTI, 

In  the  ecectronic  area  there  are  several  tracking  systems: 

(a)  Radar 

There  are  several  types  of  radar  on  the  range: 

Mod  H,  a  modified  SCR-58i:  radar. 

Hod  IV,  an  X-band  radar-modification  NIKE  missile  tracking 
radar, 

FPS-8,  L-band,  AF  early-warning  air  surveillance  radar, 

FPS-I6,  a  high  precision  radar  developed  by  RCA,  Defense 
Electronic  Products, 

In  general,  the  radar  gives  azimuth,  elevation  and  range  of 
the  missile  and  boresight  corrections  are  applied  to  get  position 
data, 

(b)  AHJSA 

AZUSA  is  a  high  precision  electronic  tracking  device  using  a 
crossed  base  line  which  gives  at  a  sampling  rate  of  10  samples  per 
second,  two  direction  cosines  and  the  range,  • 


Design  of  Experiments 
(c)  DOVAP 


217 


Using  the  Doppler  principle,  an  increase  Cover  the  previous 
reading)  in  range  sum  from  the  transmitter  to  the  missile  and  back 
to  the  receiver  is  obtained.  Given  three  or  more  simultaneous 
readings,  least  square  methods  can  be  used  to  obtain  position  data. 

There  are  several  other  electronic  tracking  systems  under  develop¬ 
ment,  for  exan5)le,  EXTRADOP,  COTAR,  SECOR. 

In  connection  vTith  the  trajectory  systems  there  are  two  types  of 
problems  of  particular  interest. 


(1) 

Real 

Time  Problems 

In  these  problems,  the  con^jutation  must  be  accomplished 
essentially  simultaneously  with  the  event.  Examples  are; 

(a) 

Impact  prediction 

(b) 

Apogee  prediction 

(c) 

Nose  cone  location 

(d) 

Quick  look  data 

(2) 

Data 

Reduction  Problems 

In  data  reduction  problems  time  is  not  the  primary  consideration, 
but  maximum  effort  is  exerted  to  obtain  optimum  amount  of  information 
from  the  data  collected. 

In  performing  these  functions  many  interesting  statistical 
problems  present  themselves. 

Problem  (l)  The  Accuracy  Problem 

Basically  the  question  here  is:  How  accurate  are  the  data  obtained 
from  each  instrument  and  what  can  be  done  to  improve  the  accuracy. 

We  are  interested  in  the  accuracy  problem  from  several  points  of 

view; 


(i)  From  a  test-by-test  point  of  view.  In  order  for  the  data 

to  be  useful  in  the  evaluation  of  guidance  systems,  propulsion  systems, 
etc.  on  an  indiviaual  test,  the  data  must  be  known  to  be  sufficiently 
accuirate.  In  addition,  this  information  is  required  to  evaluate  and 
improve  the  uerformance  of  the  sj>stem. 

(ii)  From  the  long  range  point  of  view.  In  order  to  develop  a 
range  with  the  required  capabilities,  it  is  necessary  that  for  each 
instrumentation  system  the  following  be  known. 


Design  of  Experiments 


(a)  The  inherent  or  theoretical  accuracy  capabilities  of 
the  system  as  it  exists  on  the  range, 

(b)  The  accuracy  that  is  being  achieved  on  the  range 
operationally, 

(c)  Methods  for  improving  the  operational  accuracy  and 
making  it  approach  the  inherent  accuracy  of  the  system. 

(d)  ^bdifications  in  the  hardware  of  the  system  required 
to  improve  the  inherent  or  theoretical  accuracy  of  the  system, 

(iii)  Error  Stupes  for  the  various  systems.  This  involves  an 
understanding  of:  (a)  systematic  errors  and  (.b)  random  errors, 

(a)  Present  methods  for  getting  information  about  the 
systematic  errors  are: 

(l)  Comparison  of  measured  data  from  the  system 
with  data  computed  from  ballistic  camera  data,  which  is 
considered  an  order  of  magnitude  better  than  most  tracking 
data. 


(2)  Comparison  of  data  from  the  given  systam  with  a 
best  estimate  of  the  trajectory  obtained  from  all  instru¬ 
mentation, 

(3)  Construction  of  a  mathematical  model  of  the 
system  and  the  analysis  of  the  systematic  errors, 

(ii)  Study  of  residuals  when  least  square  methods 
are  used. 

It  should  be  kept  in  mind  that  in  general  the  systematic  errors 
for  the  various  systems  are  from  one  to  one  hundred  times  as  large  as 
as  the  random  errors.  The  systematic  errors  are  not  constant  and 
appear  to  behave  as  stochastic  variables. 

(b)  Present  methods  for  getting  information  concerning 
the  random  errors: 

(1)  Variate  difference  methods  using  20-50  conse¬ 
cutive  data  points, 

(2)  Polynomial  curve  fitting  using  the  F-test. 

(3)  Considering  the  residuals  of  a  moving  are 
technique. 

It  is  particularly  desirable  to  have  a  good  estimate  of  the  random 
errors  for  a  particular  system  on  a  particular  test  as  fxmctions  of 
time.  This  information  is  needed  to  settle  such  inpoirfcant  questions  as 


Design  of  Experiments 


219 

whether  the  successive  errors  in  a  measured  quantity  are  correlated 
or  not,  what  are  optiimim  smoothing  functions  and  what  are  the  best 
methods  of  estimating  velocity  and  acceleration  data  from  position 
data. 

Problem  (2)  Determine  Optimum  Methods  of  Detennining  Velocity  and 
Acceleration  Data  from  Position  Data. 

Almost  all  present  range  instrumentation  measures  quantities  which 
lead  most  directly  to  position  data.  There  is  a  great  need  for  obtaining 
accurate  velocity  and  acceleration  data  which  tax  the  state  of  the  art  in 
both  range  instrumentation  and  data  reduction.  This  problem  is  extremely 
inportant  in  both  real  time  problems  and  in  data  reduction  problems. 

In  data  reduction,  most  of  the  present  methods  assume  the  errors  in 
successive  measurements  are  uncorrelated  and  use  moving  arc  techniques  which 
essentially  fit  a  second  or  third  degree  polynomial  by  least  square  methods 
to  from  two  to  three  seconds  of  data,  evaluating  the  first  and  second  deri¬ 
vative  of  this  polynomial  at  the  mid  point. 

For  the  real  time  problems,  methods  have  been  developed  called  ’’almost 
least  square”  techniques  which  do  the  equivalent  of  this  in  a  much  shorter 
time. 


Problem  (2)  is  closely  related  to  Problem  (l)  in  that  to  settle  it, 
the  natvire  of  the  random  errors  must  be  known  explicitly.  Related 
piX)blems  are: 

Problem  (2,1)  Given  that  the  errors  in  the  measured  position 
data  x^,  Xj^+i,  j  obtained  at  intervals  of  time  t  are 

correlated  vath  kno^vn  autocorrelation  function  R^,  determine  methods 
of  obtaining  idiich  are  efficient  from  a  computational  point  of 

/  dt  T  \ 

view  (i,e,,  cotild  be  used  in  real  timej . 

Problem  (2,2)  Given  that  the  errors  in  the  measured  position 
data  Xj^,  ...  ,  obtained  at  intervals  of  time  t  satisfy 

the  relation 


Ij^+j  =  cos  t 

determine  methods  of  obtaining  dxj^+j  which  are  efficient  from  a  com- 

dt 

putational  point  of  view. 

Problem  (3)  Determine  Methods  \^ch  Can  Be  Programmed  For  An  Electronic 
Computer  For  Editing  Data. 

Since  enormous  quantities  of  data  are  processed,  attempts  are  made  to 
automate  its  handling  as  much  as  possible  through  the  use  of  electronic 
computers.  But  no  matter  how  it  is  done,  it  is  the  old  problem  of  rejection 
of  outlying  data  and  no  completely  satisfactory  solution  is  available. 


220 


Design  of  E3q)eriments 


It  is  desired  to  remove  the  data  that  is  in  error  but  under  no 
circumstances  should  useful  infomation  be  removed  from  the  data. 

There  is  interest  in  this  problem  in: 

(a)  Both  real  time  problems  and  data  reduction  problems, 

Cb)  Editing  both  input  and  output  data  for  the  computer, 

Cc)  Editing  large  discrepancies  in  the  data, 

(d)  Fine  grain  editing  of  data. 

In  some  applications,  an  attempt  is  made  to  remove  only  the  very 
large  erivars.  In  other  applications,  there  are  certain  peculiarities 
>Mch  are  to  be  edited,  for  example,  in  a  certain  record  the  error 
vd.ll  either  be  a  1  or  a  0  euid  it  is  desired  to  isolate  the  error  and 
remove  it. 

Problem  (Ij.)  Determine  Optimum  Methods  of  Smoothing  of  Data, 

This  problem  is  closely  related  to  No,  3  and  much  has  been  said  and 
written  about  it  but  it  remains  an  important  area  for  additional  work. 

Problem  (5)  Extend  the  Variate  Difference  Method  to  Dneguispaced 
Intervals, 


The  variate  difference  method  ttiat  has  been  developed  for  equispaced 
time  intervals  has  been  very  useful  at  AFMTC,  At  the  present  time  a 
Monte  Carlo  evaluation  of  this  method  is  underway.  There  are  applications 
in  which  it  would  be  desirable  to  have  the  variate  difference  method 
extended  to  \inequispaced  variables. 

Problem  (6)  Design  of  Experiments, 

An  example  will  suffice  -  a  sin^e  piece  of  complicated  electronic 
tracking  equipment  has  been  developed  to  meet  certain  specifications { 

(1)  Accuracy 

C2)  Reliability 

C3)  Maintainability 

Design  a  test  to  determine  whether  the  specifications  have  been  met  or 
not. 


Problem  (7)  Given  an  Instrumentation  System  Consisting  of  n 
Instruments  with  IQiown  Accuracy,  Considering  Geographic  Limitations, 
Determine  the  Location  of  the  Instrument  Sites  to  Give  Optimum  Accuracy 
with  Respect  to  Tra.jectory  Data  for  a  Specified  Intended  Tra.lectory, 


Design  of  Expeilaients  221 

Problem  (8)  Detennine  the  Reliability  of  a  Specified  Tracking  System, 


Range  Safety  System  or  Communication  System. 


Problem  (9)  Given  a  Pencil  of  n  Lines  in  Space,  Possibly  Specified  b 
a  Set  of  n  Azimath  and  Elevation  Angles  as  Measured  from  a  SingJ-e  Point  P, 
fl|i  Tones  Are  Subject  to  Errors  in  Measurement,  Determine  the  Conical 


ch  Best  Fits  this  Data. 


APPLICATIONS  OF  SEQUENTIAL  TIPE  DESIGNS 
AND  ANALYSES  TO  FIELD  TESTS 


„  T  ^  t>  n  V,  preceding  page  blank 

Harold  R,  Rush 

Quartermaster  Field  Evaluation  Agency 
Headquarters  Quartermaster  Research  and  Engineering  Command,  US  Array 


INTRODUCTION,  It  is  a  well-knoim  and  often  pointed  out  fact  that  the 
conditions  under  which  an  experiment  is  conducted  may  differ  widely  from 
conditions  of  actual  usage.  With  this  in  mind  a  program  for  testing 
Quartermaster  items  of  food,  clothing,  and  equipment  is  conducted  under 
controlled  field  conditions  by  the  Quartermaster  Research  and  Engineering 
Field  Evaluation  Agency.  This  testing  is  accomplished  prior  to  any  stand¬ 
ardization  procedures.  A  segment  of  this  program  is  implemented  by  eight 
accelerated  Wear  Courses  on  which  test  items  are  subjected  to  the  eqtii va¬ 
lent  of  months  of  simtilated  normal  wear  in  a  coitqsara lively  short  time. 


It  has  been  found,  however,  that  even  an  accelerated  wear  course  can 
require  the  use  of  considerable  numbers  of  test  subjects  for  extended  periods 
of  time.  Clearly  the  number  of  test  subjects  cannot  be  reduced  without  a 
corresponding  loss  of  precision  in  the  test.  If  the  number  of  subjects  can¬ 
not  be  reduced,  an  alternative  approach  is  to  seek  some  method  for  reducing 
the  manhour  reqtiirements  per  subject.  Considerable  savings  in  time  have 
been  realized  in  quality  control  work  and  other  industrial  situations  by  use 
of  sequential  type  procedures.  Thus  it  seemed  logical  to  investigate  the 
possibility  of  applying  such  procedures  to  these  field  eapeziments  where  sav¬ 
ings  in  time  was  a  desired  goal. 

The  present  investigation  concerns  an  attempt  to  adapt  the  concept  of 
sequential  type  analysis  for  use  with  accelerated  wear  field  tests  of  fabric 
dxirabillty.  The  term  sequential  analysis  is  used  here  In  its  broader  sense 
and  does  not  refer  to  the  sampling  procedure  developed  by  Wald  (15)  and 
applied  by  many  investigators  in  various  fields  C2,h,5>6,8,13) .  The  impor¬ 
tant  distinction  between  the  field  test  situation  and  those  circumstances 
which  have  thus  far  proved  amenable  to  sequential  analysis  concerns  the  sam¬ 
pling  or  observation  procedure.  In  the  more  typical  sequential  analyses  a 
series  of  independent  observations  are  made  on  items  subjected  to  a  given 
test,  and  the  hypothesis  is  accepted,  rejected,  or  a  decision  made  to  sample 
additional  independent  items  subjected  to  the  same  test.  In  field  tests, 
the  sample  size  is  usually  established  prior  to  the  test  and  testing  is 
cyclical  in  nature  producing  cumulative  wear  or  degradation.  In  these 
instances,  additional  observations  i*epresent  one  more  test  cycle  for  the 
entire  fixed  sample  rather  than  one  more  sample  subjected  to  a  predetermined 
amount  of  testing. 


RESEARCH  PROCEDURES.  The  procedures  followed  in  devising  an  explicit 
method  for  application  of  sequential  type  analysis  to  certain  kinds  of  field 
test  data  can  be  described  as  a  logical  extension  or  representation  of  what 
many  investigators  have  done  intuitively,  Irflien  an  experiment  has  progressed 
to  some  logical  stopping  point,  an  experimenter  frequently  will  apply  appro¬ 
priate  statistical  tests  to  determine  to  what  extent  observed  differences 
may  be  attributable  to  chance.  If  differences  are  not  statistically  signifi¬ 
cant  at  the  desired  level  he  may  examine  the  data  carefully  for  trends, 
inconsistencies,  etc.,  and  then  decide  that  the  experiment  should  be  continued 
because  a  few  more  cases,  assuming  coBparable  results,  will  provide  the  de¬ 
sired  probability  level  or  that  the  experiment  should  be  stopped  because  the 


22k 


Design  of  Ejqjeriments 


erratic  results  thus  far  make  it  improbable  that  significant  differences  would 
be  obtained  even  with  a  large  increase  in  the  number  of  observations.  This 
intuitive  approach  can  have  obvious  advantages  over  rigid  adherence  to  a  pre¬ 
determined  test  length.  However^  as  pointed  out  by  Anscombe  Cl)  it  is  still 
quite  susceptible  to  error. 

Some  decision  criteria  other  than  intuition  is  obviously  needed  to 
formalize  the  reasoning  behind  such  thinking.  Three  requisites  can  be  used 
to  enable  a  more  definitive  and  formal  basis  for  making  such  judgments:  a 
knowledge  of  the  magnitude  of  the  true  differences  one  wishes  to  detect,  a 
valid  estimate  of  the  population  variance,  and  a  reliable  estimate  of  the 
probability  of  detecting  a  given  difference  at  the  specified  significance 
level.  The  exact  calculation  of  this  probability,  given  by  Neyman  et  al  (12) 
is  rather  complicated  and  unwieldy.  However,  an  adequate  approximate  solu¬ 
tion  for  this  probability,  denoted  as  P,  can  be  obtained  simply  by  solving 
for  t2  in  the  follovjing  equation:  (The  notation,  in  general,  follows  that 
used  in  (7)). 

where 

3  -  specified  true  difference  desired  to  detect 

^2  =  population  variance  (estimated  from  the  experiment) 

2 

=  value  of  t,  with  degrees  of  freedom  for  o  ,  from 
Students’  “t"  table  corresponding  to  a  fixed  risk, 
a ,  of  rejecting  the  null  hypothesis  when  it  is 
true  (one-tailed  test) 

r  =  number  of  replications  per  treatment 

Having  obtained  t2,  the  desired  probability  P  of  obtaining  a  signifi¬ 
cant  result  is  the  probability  that  a  value  of  t,  with  appropriate  degrees 
of  freedom,  should  exceed  t2.  Since  the  ordinary  t-table  gives  probabili¬ 
ties,  which  we  designate  as  P’,  that  a  value  lies  outside  the  limits  +  t2, 
the  required  probability,  P,  is  1  -  Cl/2)P'. 

The  value  of  P  can  be  calculated  at  regular  discrete  intervals  during 
the  test  and  used  as  the  basis  for  a  decision  to  stop  testing  or  to  continue 
testing  until  the  next  interval.  The  criterion  value  of  P  for  such  a  de¬ 
cision  can  be  set  arbitrarily  depending  on  the  purpose  of  the  investigator 
and  the  nature  of  the  particular  testing  situation.  Figure  has  been 
prepared  to  facilitate  the  determination  of  the  P  values  associated  with 
various  values  of  r,  5  and  a » 

The  utility  of  such  a  procedure  was  evaluated  by  applying  it  to  data 
obtained  from  several  tests  previously  conducted  on  the  Field  Evaluation 
Agency’s  Fabric  Courses,  ,  The  data  from  these  field  tests  become  available 


-“-Figures  can  be  found  at  the  end  of  this  article 


Design  of  Experiments 


225 


in  time  sequence  and  a  large  number  of  test  subjects  are  required  for 
protracted  periods  of  time.  Thus,  such  tests  are  typical  of  those  which 
appear  amenable  to  sequential  procedures  and  from  which  maximum  benefits 
in  efficiency  might  be  expected. 

The  Field  Evaluation  Agency’s  Fabric  Courses  are  designed  to  enable 
comparative  evaluation  of  fabrics  in  teims  of  durability  Tinder  conditions 
approximating  accelerated  normal  wear.  The  courses  consist  of  a  series  of 
obstacles  which  when  traversed  by  subjects  wearing  test  garments  produce 
types  and  amounts  of  wear  similar  to  that  observed  in  garments  salvaged 
from  normal  field  use  but  at  a  faster  rate.  The  conplete  course  is  used 
for  testing  cotton  fabrics,  Figure  2,  and  a  modified  version  is  used  for 
wool  fabrics.  Figure  3.  One  pair  of  trousers  made  from  each  test  fabric  is 
assigned  each  man  participating  in  the  test.  The  garments  are  worn  through 
a  pre-planned  number  of  cycles  with  a  cycle  consisting  of  two  traversals  of 
the  fabric  course  and  one  laundering.  The  trousers  are  examined  after  each 
cycle  and  failures  are  charted  and  scored  by  means  of  a  weighted  scoring 
system  developed  by  the  Field  Evaluation  Agency,  A  statistic^  analysis  is 
made  at  the  end  of  the  number  of  cycles  designated  in  the  test  plan  to  de¬ 
termine  if  differences ' in  average  wear  scores  of  fabric  types  are  significant , 
Statistically  speaking,  the  experimental  design  used  is  that  of  randomized 
blocks,  whei*e  each  test  subject  is  a  "block".  The  order  of  wear  of  the  fabric 
types  is  randomized  with  each  fabric  type  being  represented  during  each  wear 
cycle. 


Since  it  would  be  unwise  to  terminate  a  test  prior  to  the  end-point  in 
dTirability  for  which  an  item  is  designed  or  before  a  reliable  trend  has 
become  established,  the  selection  of  a  test  interval  at  which  the  test  term¬ 
ination  criterion  >jill  be  first  applied  must  be  based  on  experience  with  the 
items  and  test  methods  involved.  For  the  Fabric  Course  much  background  in¬ 
formation  was  available  for  use  in  determining  the  earliest  point  at  which  it 
would  be  desirable  to  examine  the  data.  A  typical  graph  of  the  average  cumu¬ 
lative  wear  score  per  cycle  for  a  fabric  tested  on  the  fabric  course  climbs 
steeply  during  the  earlier  cycles,  then  reaches  a  point  at  which  the  curve 
tends  to  flatten  out.  This  flattening  of  the  curve  indicates  that  the  fabric 
has  been  worn  beyond  a  point  where  the  wear  score  will  faithf-ully  reflect 
further  fabric  deterioration.  The  point  for  the  initial  analysis  of  the  data 
should  be  after  reliable  wear  trends  have  developed  and  prior  to  this  flat¬ 
tening  of  the  wear  curves.  Data  available  from  a  large  ntimber  of  past  tests 
suggests  that  reliable  wear  trends  have  developed  when  wear  scores  approx¬ 
imate  values  of  Uo  for  cotton  f abides  and  25  for  wool  serge  and  similar  wool 
blend  fabrics. 

Since  there  is  usually  considerable  variation  in  the  mmber  of  cycles 
required  to  obtain  these  critical  scores  for  different  fabrics  within  a  test, 
testing  continues  TUitil  the  lowest  average  wear  score  approximates  ij.0  for 
cotton  and  25  for  wool  and  wool  blends.  It  shoTild  be  emphasized  that  these 
values  are  applicable  to  the  fabrics  investigated  in  this  study  and  that 
they  can  change  as  the  fabrics  investigated  change.  For  example,  critical 
scores  for  wool  shirtings  are  higher  than  those  of  wool  serge,  A  carefTil 
check  of  cumulative  wear  plots  should  be  maintained  across  successive  tests 
to  determine  these  cidtical  scores  for  the  different  types  of  fabrics. 


226 


Design  of  Experiments 


Having  established  the  point  at  irrtiich  an  analysis  of  the  data  wonld  start, 
the  familiar  analysis  of  variance  was  performed  and  an  appropriate  test  for 
compaidng  means,  such  as  Duncan^  s  multiple  range  test  (lo)  was  employed.  The 
investigation  would  end  in  the  unlikely  event,  that  all  differences  between 
adjacent  ranked  means  exceeded  -/T s^tQ  that  is,  all  fabrics  differed  sig¬ 
nificantly  from  each  other.  Normally, 'tnough  some  of  the  means  maybe  found 
to  differ  significantly,  other  means  will  be  grouped  in  such  a  manner  that 
the  decision  to  accept  the  null  hypothesis  for  these  means  or  to  continue  the 
test  is  surrounded  by  uncertainty.  These  are  the  means  of  concern  and  to 
which  the  criteria  for  teimination  are  applied.  It  is  at  this  point  that2a 
value  of  P  is  confuted,  utilizing  the  sample  variance  as  an  estimate  of  a 
and  setting  5  at  20  per  cent  as  experience  has  shown  that  detection  of  dif¬ 
ferences  in  fabric  durability  of  less  liian  20  per  cent  is  unlikely.  This 
value  of  P  is  then  evaluated  against  some  predetermined  level.  For  the  fabric 
course  a  P  of  «80  was  selected  as  giving  reasonable  protection  levels  against 
committing  a  Type  H  error.  To  facilitate  computations,  the  relationship 
between  selected  values  of  r,  associated  standard  errors,  and  true  mean  dif¬ 
ferences  for  P  =  .80  are  given  in  Figure  U, 

RESULTS.  The  results  of  the  application  of  this  type  of  sequential 
analysis  to  a  test  of  cotton  fabrics  (l3)  can  be  seen  from  Table  I  (Slide  #5) . 
In  that  test  four  different  cotton  fabrics  were  run  for  10  cycles  on  the 
fabric  course.  Using  the  usual  procedure,  only  that  analysis  shown  in  the 
last  row  of  Table  I  would  be  made.  However,  examination  of  the  mean  wear 
scores  by  cycle  indicated  that  sequential  statistical  analysis  should  start 
after  cycle  7  where  the  lowest  average  wear  score,  38,^,  was  an  adequate 
approximation  of  the  irdniimim  wear  score  required  for  establishment  of  reli¬ 
able  trends.  Analysis  of  the  data  at  this  cycle  allowed  definitive  state¬ 
ments  to  be  made  with  respect  to  types  K  and  CVi.  The  maximum  difference 
between  adjacent  ranked  means  for  the  remaining  fabrics,  in  this  instance 
CS  and  KR,  expressed  as  a  per  cent  of  the  general  mean  was  lli  per  cent. 

With  a  standard  error  of  38  per  cent  and  a  sample  size  of  37,  it  is  seen 
from  Figure  I  that  the  probability  of  detecting  a  difference  of  as  much  as 
20  per  cent  is  ,73*  Applying  the  stipulated  decision  criteria  the  test 
would  be  terminated  at  this  point,  Con5>aring  these  results  with  those  ob¬ 
tained  at  the  end  of  the  full  ten  cycles  showed  that  substantially  the  same 
conclusions  would  be  draim  with  the  same  level  of  confidence.  The  seeming¬ 
ly  desirable  inverse  relationship  between  the  number  of  cycles  and  the 
magnitude  of  the  standard  error  as  seen  in  Table  I  can  he  misleading,  since 
it  is  accompanied  by  decreasing  pTOportional  differences  between  fabric 
wear  scoi^s  as  maximum  wear  is  approached  thereby  suggesting  loss  in 
sensitivity. 

In  a  similar  manner,  the  foregoing  procedure  was  ^plied  to  an  addi¬ 
tional  2  cotton  fabric  and  a  3  wool  fabric  tests.  It  was  found  that  the 
proposed  pix>cedure  worked  quite  well  for  cotton  tests,  allowing  a  reduction 
in  the  test  period  of  from  2  to  3  cycles.  The  same  conclusions  with  respect 
to  fabric  comparisons  would  be  made  after  the  proposed  shortened  test  period 
as  were  made  at  the  original  end  of  the  test.  Applying  the  same  criterion, 
that  of  conformance  of  results,  to  the  three  vjoolen  tests,  it  was  found  that 
only  one  behaved  in  such  exemplary  fashion.  On  the  other  two  woolen  tests, 
some  discrepancies  in  conclusions  were  noted.  In  these  instances,  however, 
it  was  felt  that  the  shortened  test  period  gave  a  truer  evaluation  of  fabric 
differences , 


Design  of  Experiments 


227 


DISCUSSION,  This  method  is  not  only  of  immediate  value  in  enabling  more 
efficient  fabiic  course  testing  while  sustaining  essentially  the  same  results 
as  the  longer,  less  efficient  procedure,  but  is  equally  important  for  its  po¬ 
tential  application  to  other  comparable  testing  situations.  Speaking  only  of 
the  realm  of  field  testing,  it  might  profitably  be  applied  to  durability  test¬ 
ing  of  many  types  of  footwear,  socks,  gloves,  and  other  clothing  items.  Tests 
of  shipping  containers  and  gasoline  drums  are  other  items  vdiich  might  be  sus¬ 
ceptible  to  this  sort  of  analysis  if  the  tests  are  designed  with  that  in  mind. 
Recognizing  the  present  achievement,  the  desirability  of  investigating  the 
possibility  of  similar  applications  in  all  instances  vdiere  data  become  avail¬ 
able  for  analysis  in  time-sequence  and  the  other  requisites  are  approximated 
is  obvious. 

In  other  such  testing  situations  at  the  Agency  which  will  be  investi¬ 
gated,  a  more  general  solution  formulated  by  Tang  (li^)  may  be  applicable. 

He  has  investigated  the  sensitivity  of  the  analysis  of  variance  test  for 
the  general  case  of  t  treatments,  and  prepared  tables  from  which  the  size 
of  the  Type  II  error  can  be  determined,  given  the  number  of  replicates,  the 
treatment  effects,  the  size  of  the  Type  I  error  and  a  reliable  estimate  of 
0^  ,  Tang's  procedure  is  concerned  with  testing  for  differences  among  all 
treatments  in  a  group,  ^diereas  the  question  posed  in  this  study  is  whether, 
after  any  particular  cycle  of  wear,  any  two  adjacent  ranked  treatments 
(fabrics) ,  differ  significantly  from  each  other, 

A  quite  recent  article  by  Bechofer  (3)  presents  a  multiple  decision 
procedure  for  selecting  the  best  one  of  several  normal  population  means 
with  a  common  vinknown  variance,  and,  as  pointed  out  in  the  article,  the 
problem  of  selecting  and  ordering  the  t  populations  with  the-  largest  pop¬ 
ulation  means  also  can  be  treated  within  the  same  general  theoretical 
framework.  However,  the  sampling  procedure  \diich  is  the  orthodox  sequen¬ 
tial  one,  coxild  be  unworkable  for  the  type  of  experiment  discussed  here. 
Consider  a  test  of  five  treatments  (fabrics) .  It  would  require  each  test 
subject  approximately  1;  weeks  to  complete  10  traversals  of  the  fabric 
course  for  this  number  of  treatments.  In  other  -vrords  four  weeks  would  be 
required  to  obtain  a  single  observation,  and  an  additional  four  weeks  for 
each  observation  thereafter.  It  is  apparent  that  the  tests  -vrould  take 
several  months  to  run  under  this  sanpling  procedure. 

SUMMARY.  A  sequential  type  approach  to  analysis  of  data  obtained  from 
accelerated  wear  field  tests  is  devised  through  adaptation  of  the  statistical 
concept  of  the  power  of  a  test.  The  method  requires  that  the  experiment  be 
cyclical  in  nature  and  that  the  data  become  available  in  time  sequence. 

From  a  knowledge  of  the  magnitude  of  the  differences  it  is  desired  to  detect 
in  the  experiment  and  an  estimate  of  the  population  variance,  the  probability 
of  detecting  such  a  difference  at  a  given  significance  level  is  determined. 
This  probability  is  used  after  each  cyclical  analysis  in  making  a  decision 
to  con-tinue  or  to  termina-te  testing.  Application  of  this  procedure  to  a 
nmber  of  accelerated  wear  fabric  tests  conducted  on  the  Fabric  Course  demon¬ 
strated  a  pcbential  savings  of  20  per  cent  to  33  per  cent  in  test  personnel 
manhours  without  loss  of  meaningful  information.  The  adaptation  of  such 
me-thods  to  other  field  tests  appears  highly  feasible. 


PRECEDING  PAGE  BLANK 


PROBABILirr 


Figure  1. 


229 


I  AA 


Probability  of  detecting  true  difference!  for  ipecified  values  of  ^./t/ond  <t 


PRECEDING  PAGE  BLANK 


SNOIlVOIldad 

(O  O  O  M  «  (O  OOM^tOttO 

M  M  to  10  10  10  fOO'^^’O'^in 


s  ^ 

to  ^ 


in 

c  ^ 

9  o 

CO 

S  .. 

•s.i' 

0) 

k.  •» 

jO 

O  Si 

10  o 


O  <i> 
CM  > 
fl) 

k.  . 

O  o 

•*-  sS 

CO  lO 
a> 

O  m 
C  JC 

Q> 


Jt:  o 
*o 

(D 

®  C 
O 

O  O 
o  H- 

^  c 
^  o» 
T3  *5) 


.■9 

‘(5  ^ 

W  CO 

o  d> 
CL  H 


NVaVV  dO  iNdOddd  SV  dONdddddlQ  BHdl 


TABLE  I 


ANALYSIS  OF  DATA  FROM  TEA  ^3095,  COTTON  FABRIC  TEST 

(r  ■  37) 


m 

Average  Wear  Score  and 
Multiple  Comparison  Test^ 

Standai^ 

Error® 

F 

Value 

Mean 

Wear 

Score 

Value  of 
P  for 

8=  20^ 

6 

CS  KR  K  CW 

30.3  38.1i  U8.7  67.2 

.  h2 

21i.82 

li6.1 

0,65 

7 

CS  KR  K  CW 

38.5  a6.5  62.8  81.2 

38 

27.86 

57.3 

0.73 

8 

37 

26.80 

65.7  . 

9 

CS  KR  K  CW  ■ 

55.0  63.0  83.7  101,0 

33 

25.U6 

1 

75.7 

10 

CS  KR  K  CW 

65.9  76.7  9U.8  112,6 

30 

22,21 

87.5 

^Duncan's  multiple  range  test  (lO) 
®As  a  percent  of  the  mean. 


239 


BIBLIOGRAPHY 

1.  Anscombe,  F.  J.,  Fixed  Sample  Size  Analysis  of  Sequential  Observations. 

Biometrics  10,  89-100,  195^ • 

2.  Arraitage,  P.,  Sequential  Tests  in  Prophylactic  and  Theurapeutic  Trials « 

Quart.  Journ.  Med.  23,  255-274,  195^. 

3»  Bechofer,  Robert  E. ,  A  Sequential  Multiple  Decision  Procedure  for 
Selecting  The  Best  One  of  Several  Normal  Populations  With  a 
Common  Unknoym  Variance.  And  Its  Use  With  Various  Experimental 
Designs,  Biometrics  l4,  408-429,  193^ » 

4.  Bradley,  Ralph.  A.,  Some  Statistical  Methods  in  Taste  Testing  and 

Quality  Evaluation,  Biometrics  9«  22-28,  1933 » 

5.  Bross,  I.  D.  J.,  Sequential  Medical  Plans,  Biometrics  8,  I83-I87,  1952. 

6.  Brownlee,  K.  A.  et  al..  The  TJp-yid-Down  Method  \>dth  Small  Samples. 

Jour.  Amer,  Stat.  Assoc .  48 ,  262-277 1  1953 • 

7.  Cochran,  W.  G.  &  Cox,  Gertrude,  Experimental  Design,  (2nd  Ed.)  New 

York;  Wiley,  1957 • 

8.  Dixon,  W.  J.,  and  Mood,  A,  M. ,  A  Method  for  Obtaining  and  Analyzing 

Sensitivity  Data,  Jour.  Amer.  Stat.  Assoc.  43*  109-126,  1948. 

9.  Duncan,  D.  B.,  Multiple  Range  and  Multiple  F  Tests,  Biometrics  11, 

1-42,  1955. 

10.  Fiske,  D.  W.  &  Jones,  L.  V.,  Sequential  Aneilysis  in  Psychological 

Research,  Psychol.  Bull.  31t  264-276,  1954. 

11.  Matthews,  J.  M.  FEA  53095*  Wear  Resistance  of  9  oz.  Cotton  Sateen 

Blended  with  Rayon,  R  8e  E  Field  Evaluation  Agency,  Ft.  Lee, 

VaTl  195^". 

12.  Neyman,  J.,  Iwaszkiewicz ,  K.  and  Koldziejczyk,  St.,  Statistical 

Problems  in  Agricultural  Experimentation,  Suppl.  Jour.  Roy. 

Stat.  Soc.,  27  107-154,  1935. 

13.  Radkins,  Andrew  P.,  Sequential  Analysis  in  Organoleptic  Research 

Triangle,  Paired,  Duo-Trio  Tests,  Food  Research  23,  225-234,  1958. 

14.  Tang,  P.  C.,  The  Power  Function  of  the  Analysis  of  Variance  Tests 

With  Tables  and  Illustrations  of  Their  Use,  Stat,  Res.  Mem.  2, 

12^149,  I93S. 

15.  Wald,  A.,  Sequential  Analysis,  New  York:  Wiley,  1947* 


PRECEDING  PAGE  BLANK 


A  SEQUENTIAL  OBSERVATIONAL  PROGRAM  USED  IN  A 
STUDY  OF  A  RESPONSE  SURFACE  FOR  A  COMPLEX  WEAPONS  SYSTEM* 


^V/ 


William  J.  Wrobleeki 
The  University  of  Michigan 

The  family  of  probability  distributions  associated  with  the  number  of 
aircraft  destroyed  by  a  MISSILE  MASTER  Anti-Aircraft  Defense  System  in  raids 
which  belong  to  a  relevant  raid  space  is  a  response  surface  of  considerable 
importance  to  numerous  different  agencies.  Each  of  these  agencies  must  make 
a  variety  of  decisions  about  the  MISSILE  MASTER  System,  decisions  which  in¬ 
volve  huge  costs  and  must  reflect  the  essential  characteristics  of  this  family 
of  probability  distributions. 

In  order  to  study  this  response  surface  a  MISSILE  MASTER  System  was  viewed 
as  a  stochastic  structural  relation  between: 

(1)  the  primary  random  veiriable  (i.e.,  the  number  of  aircraft  destroyed) 

(2)  a  set  of  secondary  random  variables  (i.e.,  certain  service  times). 

A  truncated  sequential  observational  program  was  conducted  in  stages.  It 
was  designed  to  study  the  secondary  random  variables  and  some  of  their  inter¬ 
relationships  which  were  imposed  because  the  system  acted  as  a  stochastic 
structure  between  primary  and  secondary  variables. 

This  experimental  design  included  specification  of; 

(1)  the  random  variables  observed  during  the  program, 

(2)  the  sampling  plan  for  each  stage  of  the  observational  program, 

(3)  the  terminal  decisions  made  at  the  end  of  each  stage,  and 

(4)  the  statistical  techniques  used  to  make  terminal  decisions. 

Except  for  the  initial  stage,  the  sampling  plan  for  each  successive  stage  of 
the  observational  program  was  based  upon  terminal  decisions  made  at  the  end 
of  each  of  the  preceding  stages.  It  was  specified  prior  to  the  beginning  of 
the  observational  program  that  if  experimentation  continued  through  a  fifth 
stage  it  was  to  be  terminated  at  the  end  of  that  stage  regardless  of  the  ter¬ 
minal  decisions  obtained  from  the  five  experimental  stages. 

From  these  studies  a  representation  or  model  of  a  MISSILE  MASTER  System 
was  constructed  using  a  digital  computer,  and  a  second  observation  program  for 
estimating  the  response  surface  from  this  representation  weis  designed. 

Before  proceeding  to  a  more  particularized  account  of  the  observational 
program  whose  general  characteristics  have  been  sketched,  I  shall  outline 
abstractly  the  central  estimation  problem  which  motivated  and  spanned  the 
entire  investigation.  This  abstract  formulation  will  establish  a  common 


♦  This  work  was  conducted  under  contract  to  the  United  States  Army  Signal 
Engineering  Laboratories  Contract  DA-56-039  SC-64627. 


preceding  page  blank 


21^ 


Design  of  Experiments 


frame  of  reference  for  the  subsequent  accounts  of  the  observational  program 
vd-thout  necessitating  a  detailed  description  of  a  MISSILE  MASTER  System,  It 
vrill  also  provide  motivation  for  the  selection  of  those  random  variables  that 
were  studied  in  the  observational  program. 

To  begin  let  us  consider  the  measurable  space  where  Z  is  the 

collection  of  non-negative  integers  and  O is  the  family  of  all  subsets  of  Z. 

In  addition  we  consider  a  set  R,  called  the  raid  space.  Let  a  be  the  generic 
symbol  for  an  aircraft  and  r  the  generic  symbol  for  a  raid  contained  in  R. 

The  relation  aer  means  "a  is  an  aircraft  in  r”;  and  we  let.n_(r^)  denote  the 
cardinality  of  the  set  {a|  aerj.  For  each  reR,  the  equivalence  relation  i  r;  j 
in  Z  defined  by  i  £  j  (mod  n(r)  +  l)  decomposes  Z  ihto  n(r)  +1  disjoint  and 


exhaustive  subsets  Z,  (r)  = 


zeZ  2-k 

n(rj  +  1 


e  ZL  k  =  0,  1,  2,  ...,  n(r). 


Correspond  each  ze  Z,  (r)  with  k  and  denote  thiso  measurable  fimction  from 
Z  to  Z  by  N(z  r). 

A  MISSILE  MASTER  Anti-Aircraft  Defense  System,  hereafter  denoted  by  S, 
consists  of  a  ring  of  missile  batteries  and  an  automatized  coordination  center 
for  the  missile  batteries  of  the  ring.  Now  interpreting  N(z|  r)  as  the  ntimber 
of  aircraft  in  the  raid  reR  destroyed  by  S,  S  may  be  viewed  as  a  mechanism  by 
which  an  observation  of  the  random  variable  N(z  |  r)  is  generated.  In  essence 
S  generates  a  probability  distribution  ii(r,  S)  =  /tc  ,  n, ,  n  ,  'll  on  (z.n*) 

v/here  (r,S)  =  mCz^Cr)  |  S)  =  Pr  (^z| n(z|  r)^°k^|s) ,  k  =  n(r). 

Denoting  ^•n:(r,S)  |  reR^  by  n(R,S),  the  problem  of  estimating  it(R,S)  was  the 
central  problem  which  motivated  the  entire  observational  program,  n(R,S)  has 
been  variously  called  the  response  surface  of  S  relative  to  R,  the  performance 
characteristic  space  df  S  relative  to  R,  and  the  output  state  space  of  S  rela¬ 
tive  to  R, 

Following  are  some  questions  which  occur  in  designing  an  observational 
program  for  estimating  7i(R,S)  : 

(1)  If  J  denotes  an  index  set  for  the  number  of  different  raids  to  be 
floi-m  against  3  during  the  observational  program,  what  should  its  cardinality 
n(J)  be? 

(2)  Corresponding  to  each  jeJ,  what  choice  should  be  made  for  the  tuple 

(r.,  m.)  where  r  .cR  and  m.  denoted  the  number  of  replications  of  r .  to  be  made 
3  3  J  3  3 

during  the  observational  program? 

(3)  Should  n(J),  m^,  m^,...,  or  random  variables? 

(4)  Is  there  a  natural  "estimation  topology"  for  the  space  u(R,S)?  If 
there  were  such  a  topology  0,  the  r  ,*s  could  be  selected  so  that  { it(r.,  S)  j 

1  3  3 

jeJ  h  is  an  0-  dense  set  in  ti(R,S), 

Given  a  tuple,  (r,  m) ,  m  observations  of  the  random  variable  nCzJ  r) 
would  be  made  during  the  observational  program,  say  N. ,  N  ,  ...,  N  ,  and 

n(r,S)  estimated  by  Ti(r,S)  =  iJiq  ^2.  ^  n(rn  \  \ 

denotes  the  nvimber  of  N.*s  which  are  equal  to  k,  j  =  1,  2,  ...,  m  and  k  =  0, 

1,  ...,  n(r).  J 


Design  of  Experiments 


243 


However,  the  number  of  aircraft  destroyed  by  S  in  a  mock  raid  can  not 
be  observed  directly;  and  so  the  follovdng  question  ariese  in  designing  an 
observational  program  for  estimating  n(R,S):  Does  there  exist  a  random 
variable  which  is  observable  and  from  which  the  value  of  N(25jr)  can  be  in¬ 
ferred? 

Let  b  be  the  generic  symbol  for  a  missile  battery,  b|  a  the  generic 
symbol  for  the  exent  that  b  destroys  a,  and  P(bja)  the  generic  symbol  for 
the  probability  that  the  exent  bja  occurs.  Suppose  P(b[a)  were  known  as  a 
function  P(S)  of  the  distance  S  *=  S(a,b)  of  a  from  b«  Then  if  at  the  in¬ 
stants  —  •••  —’I  when  b  simulates  a  missile  launch  at  a,  the  dis¬ 
tances  •••»  of  a.  from  b  were  known  for  each  (the  set  of  bcS 

that  simulate  a  missile  launch  at  a)  and  for  each  aer,  an  observation  of  the 
random  variable  N(z  h)  could  be  generated. 

Therefore,  if  P(bja)  were  known  as  a  function. of  S,  a  reasonable 
observational  program  would  be  the  following; 

(1)  Enqjloy  a  sampling  plan  evolved  from  consideration  of  questions  1,  2, 

3  and  4;  and  for  each  replication  of  a  raid  specified  by  the  sampling  plan 
observe  the  distance  of  b  from  a  at  the  instants  of  simulated  missile  launches 
for  each  battery  -  aircraft  assignment  combination  (b,a)  made  during  the  raid, 

(2)  From  these  observations  calculate  P(b|a)  from  the  function  P(S)  and 
generate  an  observation  of  the  random  variable  N(zjr). 

(3)  Employ  the  estimate  suggested,  namely  tCj^  =  io  make  the  terminal 

decisions  about  the  magnitude  of  x  e  7i(r,S)  for  k  =  0^  1^  2,  n(r), 

K 

The  modifications  of  this  observational  program  which  yield  the  observa¬ 
tional  program  used  for  experimentation  with  the  MISSILE  MASTER  System  that 
was  studied  can  be  motivated  by  considering  the  nature  of  system  effects  on 
n(R,S),  For  this  purpose  a  useful  classification  of  system  effects  is  the 
following ; 

(1)  STRUCTURAL  EFFECTS  produced  by  the  characteristics  and  configurations 
of  the  system’s  equipment  complex.  The  number  of  missile  batteries  belonging 
to  S  is  an  example  of  a  characteristic  of  the  battery  configuration  of  the 
system’s  equipment  complex  which  yields  a  structural  effect  of  Tt(R,s). 

(2)  PROCEDURAL  EFFECTS  produced  by  the  system's  pules  of  operation  or 
standing  operating  procedures  which  weld  together  components  of  the  system's 
equipment  complex.  For  example,  the  mi.ssile  batteries  and  positional  infor¬ 
mation  tracking  components  of  the  system’s  equipment  complex  are  connected  by 
the  system's  assignment  doctrine.  The  assignment  doctrine  yields  a  procedural 
effect  on  x(R,s). 

(3)  OPERATOR  EFFECTS  produced  by  individual  differences  in  operating  the 
system’s  equipment  complex  and  executing  the  system's  standing  operating 
procedures. 


244 


Design  of  Experiments 


(4)  ENVIRONMENTAL  EFFECTS  produced  by  the  climatic  and  topographical 
characteristics  of  the  system’s  locale. 

From  this  classification  of  system  effects  two  notable  reasons  can  be 
derived  for  modifying  the  observational  program  that  has  been  described.  The 
first  reason  is  that  operator  difference  may  produce  such  variability  in  the 
system’s  structural  and  procedural  complexes  that  additional  replications  of 
raids  are  necessary  to  obtain  significant  results.  In  other  words  inherent 
in  this  observational  program  are  two  untenable  risks:  namely,  the  ride  of 
having  the  cost  of  the  program  rise  to  prohibitive  heights  before  significant 
results  are  obtained  and  the  complementary  risk  of  having  to  terminate  the 
program  before  significant  results  are  obtained  in  order  to  avoid  astronomical 
costs.  These  risks  cannot  be  completely  eliminated  from  any  observational 
program  whose  objective  is  to  provide  experimental  information  for  estimating 
ti(R,S).  Bht  can  a  set  of  random  variables  be  found  whose  members  have  a 
meaningful  relation  to  the  problem  of  estimating n (R,S)  and  are  adaptable  to 
6in  observational  program  in  which  these  risks  are  reduced? 

The  second  reason  for  modifying  this  observational  program  stems  from 
the  desire  to  study  n  (B,S* )  for  different  systems  S'  obtained  from  6  through 
simple  modifications  of  its  structural  and  procedural  complexes  without 
having  to  design  and  execute  another  observational  program  for  the  system  S'. 
In  terms  of  this  reason  for  modifying  the  suggested  observational  program, 
can  a  set  of  random  variables  be  found  whose  members  have  a  meaningful  rela¬ 
tion  to  the  problem  of  estimating  n (R,S)  and  are  relatively  independent  of 
the  structural  and  procedural  effects  of  S? 

By  considering  tr,Sj  (the  raid-system  phase  space)  we  will  see  that 
affirmative  answers  can  be  given  to  both  of  these  questions.  The  raid-system 
phase  space  although  conceptually  trivial  is  difficult  to  characterize  nota- 
tionally.  Essentially  it  consists  of  the  following  points: 

(1)  The  instant  T  of  arrival  of  a  for  each  a  e  r, 

a  — 

(2)  The  instant  %  of  detection  of  a  for  each  a  .e  r. 

*  di  ^ 

(3)  The  instant  of  entry  of  a  for  each  a  e  r. 

di 

(4)  The  instant  p  of  assignment  of  a  to  b  for  each  a  e  r  and  b  e  S. 

(If  a  is  not  assigned  to  b  put  =oo). 

(5)  The  instant of  acquisition  of  a  by  b  for  each  a  e  r  and  b  e  S. 

(If  a  is  not  acquired  by  b  =o0). 

(6)  The  instants  ^  of  simulated  missile  launches  by  b  and  a  for 

each  b  e  S  and  a  e  r.  (If  b  does  not  simulate  a  missile  launch  at 

(7)  The  instantV  that  a  reaches  its  bomb  release  point. 


Design  of  Experiments 


245 


In  addition,  for  each  a  e  r,  let  v  (t)  denote  the  velocity  of  a  at  the 

instant  t,  and®  (a,p)  the  path  of  a  between  the  time  instants  a  and  p.  Then 

if  [5,^  were  known  and  if,  for  each  a  e  r,  (p(Ta»^a  )  and  v^(t)  for  t  e 

were  known,  the  set  of  distances  of  a  form  b  at  the  instants 

of  simulated  missile  launches  could  be  computed  and  so  an  observation  of  N(z|r) 
generated  using  the  formula  P(S)  for  P(bja), 

Let  us  look  at  the  following  random  variables  which  are  system  epoch  times 
and  can  be  computed  from  [r.g: 

(1)  %  -  the  time  from  arrival  to  detection. 

(2)  K  -  "the  time  from  detection  to  entry. 

(3)  P  -  the  time  from  entry  to  assignment. 

(4) '^  -  p,  the  time  from  assignment  to  acquisition. 

(5)  -\|/“,  the  time  from  acquisition  to  missile  launch. 

'P  -"t  and  ^  -  ijr  are  not  as  dependent  upon  the  structural  and  procedural 
complexes  of  S  as  are  H  -"X  ,  P  -  K  $  ijf-  p.  These  random  variables  are 
composed  of  a  waiting  time  component  and  a  service  time  component.  For  the 
random  variables  K  -^1  P  ~Kf  and  iff-  p  the  waiting  time  ccanponent  is  more 
significant  than  the  service  time  component,  and  the  waiting  time  ccanponent 
is  dominantly  affected  by  the  structural  and  procedural  complexes  of  S, 

For  example,  consider  the  time  from  detection  to  entry,  K  K  W 
(K  -  ■)&)  +  S(K  -'Z)  where  W(K  -%)  is  the  time  spent  in  waiting  from  the  in¬ 
stant  of  detection  to  the  instant  system  entry  service  commences  and  S(K  -X) 
is  the  length  of  the  system  entry  service  period.  Among  other  things  W(k  -7.) 
depends  upon  the  number  of  system  entry  service  units  and  upon  the  system’s 
entry  SOP  and  is  affected  by  the  structural  and  procedural  conplese  of  the 
system.  But  these  system  effects  assume  a  different  role  when  S(K  -X>)  is 
considered. 

It  was  through  this  series  of  modifications  and  for  the  reasons  indicated 
that  the  observational  program  previously  sketched  was  moulded  into  one  which 
had  as  its  primary  objectives  the  study  of  certain  service  time  distributions 
and  decision  processes  associated  with  a  MISSILE  MASTER  System,  We  are  now 
in  a  position  to  give  a  more  particularized  account  of  this  observational 
program:  especially  of  the  sequential  sampling  plan  used,  the  terminal  deci¬ 
sions  made,  and  the  statistical  procedures  used  to  make  these  terminal  decisions. 
I  shall  do  this  only  for  that  part  of  the  observational  program  aimed  at  the 
study  of  system  service  time  distributions.  To  do  this  for  the  complementary 
part  of  the  observational  program  aimed  at  the  study  of  system  decision  pro¬ 
cesses  additional  description  of  a  MISSILE  MASTER  System  would  be  required. 

Beginning,  let  T  denote  one  of  the  system  service  times  and  F(t|r,S),  t>0, 
the  distribution  function  of  T  for  the  raid  r  e  R  and  the  system  S;  in  other 
words,  F(t|r,S)  =  Pr  (t  <  tlr,S),  Translated  into  technical  terras  one  objective 
of  the  observational  program  was  to  specify  the  family  of  distribution  functions 
{F(t|r,S)|reR3. 


246 


Design  of  Experiments 


Prior  to  experimentation  with  a  MISSILE  MASTER  System  a  survey  was 
initiated  to  uncover  possible  characteristics  of  this  family  of  distri¬ 
bution  functions  from  analogous  studies  carried  out  on  similar  service  units. 
These  were  studies  like  the  typical  time  and  motion  investigations  performed 
by  industrial  engineers.  In  addition  a  time  and  motion  study  using  full 
scale  wooden  models  of  components  of  a  MISSILE  MASTER  System  was  conducted 
for  the  same  purpose.  From  these  efforts,  and  prior  to  any  experimentation 
with  a  MISSILE  MASTER  System,  the  following  tentative  hypotheses  were  con¬ 
structed  about  the. family  {F(t|r,S)  |rel^  of  distribution  function: 


(1) 

F(t|r,S)  f(u|r,S)  du 

r— 

where 

f(u|r,S)  =  u"^  exp  [-(2a^)  ^(in 

u-a)^ 

and 

/A=/^(r,S),  0  =  o(r,S). 

(2) 

^(r,S)  =yu.^  n(r)  v(r)  h(r) 

vrhere 

v(r)  =  (n(r))"^  Z  f  \ 

aer  ra 

dt. 

h(r)  =  (n(r))"^  S  /  ^a 

aer  ta 

dt , 

and 

h  (t)  denotes  the  height  of  aer  at  the  instant  t, 
d. 

(3)  o(r,S)  “  o(r' ,S)  for  r,  r*  e  R, 


We  call  two  raids  r,  r'  e  R  equivalent  provided  n(r)  =  n(r’),  v(r)  = 
v(r’)  and  h(r)  =  h(r’)*  This  equivalence  relation  decomposes  R  into  disjoint 
and  exhaustive  subsets,  and  one  and  only  one  of  these  subsets  corresponds  to 
each  triple  (n,v,h).  The  sampling  plan  for  the  first  stage  of  experimentation 
consisted  of  specifying  k  -  triples  (n^,  v^,  h^,),  (n^,  v^,  h^) ,  (n^,  v^^, 

hj^) ,  and  selecting  a  raid  r  from  the  k  equivalence  classes  of  R  corresponding 

to  each  of  these  triples  for  which  cp^  was  a  straight  line  path  between  the 

instants  r  and  v  for  each  a  e  r,  and  both  v  (t)  and  h  (t)  were  constant  in 
a  a  a  a 

the  interval  (r  , v  )  and  identical  for  each  a  e  r.  For  each  r  specified  in 

Qi 

the  first  stage  of  the  sampling  plan  the  n(r)  observations  of  T  (obtained 
from  the  experiment  by  flying  r  against  S)  were  used  to  estimate  the  para¬ 
meters  /UL.  and  a  in  the  log-normal  distribution  by  the  maximum  likelihood  method. 
The  hypothesis  that  the  observed  sample  can  from  a  log-normal  distribution 
specified  by  these  estimated  parameters  was  tested  using  the  -  test.  For 
those  raids  for  which  this  hypothesis  was  not  rejected,  the  hypothesis  that 
the  logarithms  of  the  service  times  associated  with  each  such  raid  had  the 
same  variance  was  tested  by  using  Bartlett’s  homogeneity  of  variance  test. 
Finally,  for  the  maximal  subset  of  raids  for  which  neither  of  these  two 
hypotheses  was  rejected,  normal  regression  theory  was  applied  to  examine 
the  means  of  the  logarithms  of  these  service  times  as  functions  of  the 
indicated  raid  parameters. 


Design  of  Experiments 


247 


These  terminal  decisions  were  made  at  the  end  of  each  stage  of  the 
sampling  plan  using  the  statistical  procedures  mentioned.  Based  on  the 
terminal  decisions  that  occured,  the  partition  of  R  into  disjoint  and 
exhaustive  subsets  was  left  unchanged,  or  was  made  finer  by  adding  new 
raid  parameters  to  be  investigated,  or  was  made  coarser  by  dropping  para¬ 
meters  that  appeared  to  be  inconsequential.  The  next  stage  of  the  sampling 
plan  was  then  constructed  to  reflect  these  changes. 

No  formal  statistical  procedures  were  followed  to  arrive  at  the 
decision  to  continue  or  discontinue  experimentation  with  a  system  service 
time.  This  was  done  on.  an  intuitive  basis  which  reflected  the  experimental 
results  that  obtained  from  previous  stages.  However,  experimentation 
with  system  service  times  that  continued  through  five  successive  stages 
was  discontinued  regardless  of  the  terminal  decisions  that  occured. 

Using  this  observational  program  and  the  stage  by  stage  truncated 
sampling  plan,  terminal  decisions  and  statistical  decision  procedures 
described,  the  service  time  distributions  and  decision  processes  alluded 
to  have  been  studied  using  a  MISSILE  MASTER  System.  They  have  been 
synthesized  with  the  aid  of  a  digital  computer  into  representations  of 
systems  S  and  S'  (obtained  from  S  through  modifications  of  its  structural 
and  procedural  complexes)  so  that  n(R,S)  and  •n:(R,S')  can  be  estimated. 


I  have  discussed  an  approach  to  the  problem  of  estimating  a  response 
surface  associated  with  a  complex  stochastic  structure.  This  approach 
involves  reducing  the  primary  estimation  problem  to  a  number  of  relevant 
secondary  estimation  problems  which  are  partially  amenable  to  classical 
statistical  studies.  The  results  of  such  studies  can  then  by  synthesized 
into  an  estimate  of  the  response  surface.  It  is  an  approach  which  possesses 
a  certain  intuitive  appeal’,  it  has  evolved  and  is  continuing  to  develop 
in  almost  every  area  of  scientific  research  which  involves  the  investigation 
of  complex  stochastic  structures. 

This  approach,  however,  does  not  fit  comfortably  into  any  current 
statistical  theory.  The  more  "global"  types  of  statistical  procedures 
demanded  by  it  have  not  been  invented.  Nevertheless,  the  application 
of  this  method  leads  to  studies  of  random  variables  and  their  distribution 
functions  which  can  be  subjected  to  sophisticated  measurement  and  statis¬ 
tical  analysis,  and  thus  to  precise  knowledge  about  the  stochastic 
structure  in  the  "small". 

But  it  must  be  kept  in  mind  that  because  of  the  complicated  structural 
context  by  which  these  random  variables  are  related  to  each  other,  a 
representation  of  this  structural  relation  will  possess  a  considerable 
degree  of  unfaithfulness.  This  may  lead  to  a  lack  of  precise  information 
about  the  stochastic  structure  in  the  "large"  due  to  distortion  of  the 
estimate  of  the  response  surface  obtained  from  the  representation 
because  of  its  unfaithfulness. 


SOl-IE  STATISTICAL  ASPECTS  OF  PREFERENCE  AND  RELATED  TESTS 

C,  I,  Bliss 

The  Connecticut  Agricultural  Ejqperiment  Station 
and  Yale  University 

INTRODUCTION.  A  cursory  review  of  the  recent  literature  on  sensory 
testing  netted  more  than  90  references,  most  of  them  within  the  last  five 
or  ten  years.  They  range  from  the  highly  practical  and  even  naive  to  ex¬ 
cursions  in  mathematical  statistics.  The  restriction  of  my  title  to  prefer¬ 
ence  testing  narrows  the  field,  but  not  enough.  Sensory  tests  range  from 
the  checking  of  individual  preferences  in  questionnaires  to  ratings  by  ex¬ 
pert  panels,  as  in  the  professional  judging  of  tea  or  milk  (Fenton,  19^7) ♦ 

I  win  limit  myself  to  comparative  tests  in  the  middle  of  this  range,  where 
the  subject  bases  his  verdict  on  a  subjective  criterion.  In  preference 
testing,  our  attention  is  directed  as  much  to  the  population  of  which  our 
subjects  are  a  sample  as  to  the  materials  being  compared,  so  that  methods 
designed  for  small  expert  panels  may  be  quite  inpracticable  with  the  larger 
numbers  that  represent  supposedly  a  given  population, 

EXPERIMENTAL  DESIGN,  Experimental  preference  tests  should  be  restricted 
to  what  we  may  call  the  same  sensory  dimension,  avoiding  comparisons  between 
diverse  items  such  as  peaches,  salmon,  sauerkraut  and  milk  in  the  same  series 
(Peryam  and  Haynes,  1957)*  Preferences  between  different  kinds  of  fruit 
or  between  different  varieties  of  the  same  kind  would  more  nearly  fit  the 
pattern  I  propose  discussing.  Since  they  are  comparative  judgements  be¬ 
tween  two  or  more  similar  items,  we  need  not  rate  them  on  a  "hedonic'’  scale 
ranging  from  "dislike  very  much"  to  "like  very  much"  in  five  or  ten  or  more 
steps.  Despite  their  widespread  use,  individual  ratings  of  this  type  com¬ 
plicate  a  preference  test  unnecessarily.  Both  the  mean  preference  for  the 
several  items  and  their  spread  over  the  rating  scale  will  differ  from  one 
subject  to  another,  introducing  differences  in  both  the  average  response 
and  the  variance.  Because  of  their  lesser  efficiency  per  aliquot,  I  would 
also  avoid  triangular  and  duo-trio  tests  in  preference  studies  (Gridgouan, 
19^^). 


Within  these  restrictions,  what  kinds  of  con^jarative  tests  seem  to  me 
best  for  preference  studies?  If  we  have  j\ist  two  items  to  compare,  we  might 
ask,  for  exan^jle,  "Between  these  two  varieties  of  peach,  which  do  you  pre¬ 
fer?"'  If  we  have  three  or  more  varieties,  we  might  present  them  in  pairs, 
so  that  the  subject  could  corqjare  each  variety  separately  with  every  other 
variety  in  a  so-called  paired  corapaiison  (Bradley, -19^3;  Jackson  and  Flecken- 
stein,  1957).  Given  four  varieties  A  to  D,  for  example,  each  subject  would 
compare  them  in  six  pairs:  A-B  }  A-CjA-D,  B-C,  B-D,  and  C-D,  As  each  pair 
is  presented,  we  might  ask  additionally  "Is  your  preference  slight,  moderate 
or  strong,  or  in  reality  non-existent?"  (Scheffe  ,  1952  j  Bliss  et  al,  1956} 
Carroll,  1958),  Although  differences  between  subjects  in  the  strength  of 
their  preferences  may  introduce  heterogeneity  in  the  error,  the  gain  in  in¬ 
formation  will  often  justify  this  risk. 


PRECEDING  PAGE  BLANK 


2^0 


Design  of  Experiments 


An  alternative  design  when  three  or  more  items  are  to  be  compared  is 
to  ask  the  subject  to  rank  them  in  sequence  from  the  most  to  the  least  pre¬ 
ferred  (Bliss  et  al,  19^3,  1953;  Greenwood  and  Salerno,  19k9),  To  avoid 
bias  from  the  order  of  presentation,  the  order  can  be  randomized  for  each 
subject  or  balanced  with  a  Latin  square,  the  rows  representing  subjects  or 
sessions,  the  columns  order  of  presentation,  and  letters  the  items  to  be 
compared*  Since  the  design  should  facilitate  comparisons  within  a  selected 
set  of  stimuli,  ranking  is  most  effective  idien  these  are  qualitatively  sim¬ 
ilar,  If  the  critical  stimuU  are  qualitatively  dissimilar,  -Hie  subject 
may  have  less  difficulty  in  choosing  between  the  two  members  of  a  pair  than 
in  ranking  three  or  more  in  order.  Under  these  circumstances  paired  com¬ 
parisons  would  be  preferred. 

Ranking  a  series  of  three  to  five  or  six  varieties  or  '’treatments”  may 
work  well,  but  with  longer  series  sensory  fatigue  can  blunt  the  subject's 
ability  to  discriminate.  If  the  testing  of  all  possible  pairs  requires 
too  many  replicates,  ranking  in  incomplete  balanced  blocks  may  be  the  solu¬ 
tion  (Hopkins,  195U5  Murphy  et  al,  1957).  In  one  of  these  known  as  the 
louden  square,  each  row,  representing  the  order  of  presentation,  contains 
all  varieties,  and  each  column  the  varieties  compared  in  one  session  by 
one  subject.  Within  columns  every  variety  occurs  equally  often  with  every 
other  variety  in  the  series.  The  two  louden  squares  in  Table  1*  provide 
the  testing  of  7  varieties  in  groups  of  three  and  of  four,  the  upper  and 
lower  sections  together  forming  a  7x7  Latin  square  (louden,  19ii0),  A  sim¬ 
plified  rank  analysis  for  incomplete  block  designs  has  been  described  by 
Dykstra  (1956), 

The  scope  and  efficiency  of  many  preference  tests  can  be  enlarged'  with 
a  factorial  design  of  the  treatments.  In  a  2x2  factorial,  for  example, 
American  and  Dutch  process  cocoa  S3nrups  were  prepared  by  both  the  "hot”  and 
"cold”  methods  and  the  four  combinations  tas'ted  in  a  paired  comparison  by 
each  of  30  subjects  (Reid  and  Becker,  1956).  Only  the  interaction  proved 
significant,  leading  to  the  recommendation  that  the  "hot”  method  be  used 
for  American  cocoa  and  the  "cold”  method  for  Dutch  cocoa.  Other  factorial 
taste  tests  include  a  2x2  factorial  on  pesticide  flavors  in  apples  (Bliss 
et  al,  1956),  a  2x3  palatability  test  on  kale  (Greenwood  and  Salerno,  19it9), 
and  a  Iix2x2  taste  test  on  jam  (Gridgeman,  1956),  As  the  number  of  factors 
is  increased,  the  e:q5eriment  may  be  kept  within  bounds  with  a  fractional 
factorial,  as  shown  by  Carroll  (1958)  for  five  formulation  variables  of  a 
pudding,  each  at  two  levels.  By  selecting  a  particular  set  of  l6  from  the 
32  possible  paired  conparisons  and  a  Scheffe  rating  for  the  degree  of  pref¬ 
erence,  she  measured  the  effect  of  each  variable  with  a  marked  gain  in 
efficiency. 


Tables  are  to  be  found  at  the  end  of  this  article. 


Design  of  Experiments 


251 


A  preference  test,  of  course,  should  also  be  controlled  in  its  environ¬ 
ment,  in  preventing  collusion  between  subjects,  and  in  uniformity  of  the 
samples,  all  of  which  are  noted,  in  the  literature.  Even  time  of  day  may 

be  in^jortant  (Harries,  1953). 

Ideally,  the  subjects  in  a  preference  test  are  a  random  sample  from 
the  population  whose  preferences  we  are  trying  to  measure.  In  some  e:j5)eri- 
ments  this  ideal  can  be  approximated,  as  in  tests  reported  by  Pangbom  et 
al  (1957),  but  more  often,  we  may  be  limited  to  personnel  available  to  the 
testing  laboratory,  who  in  time  may  become  seasoned  as  subjects.  Training 
or  practice  sessions  have  so  increased  the  consistency  and  sensitivity  of 
the  ratings  in  some  sensory  tests  (Bennett,  1956)  ,  that  a  practice  session 
may  be  warranted  even  in  large  scale  preference  testing,.  Since  we  wish  to 
infer  the  preferences  of  a  theoretically  unlimited  population  from  our 
sample  of  subjects,  larger  panels  may  be  justified  than  in  other  types  of 
tests.  Among  the  studies  on  panel  selection  are  comparisons  with  consumer 
surveys  (Marphy  et  al,  1958) ^  the  sequential  testing  of  prospective  subjects 
(Bradley,  1953 j  Armitage,  1957),  and  the  relation  of  screening  tests,  with 
water  solutions  to  judging  ability  for  foods  (Mackey  and  Jones,  195i|.)o  If 
the  test  material  or  the  total  number  of  responses  is  limited,  we  can  broad¬ 
en  our  saupling  by  testing  each  subject  only  once,  but  for  precise  compari¬ 
sons,  a  design  giving  two  or  more  complete  replicates  from  each  individual 
would  be  preferred. 

If  the  experimental  procedure  is  satisfactorily  controlled,  the  test 
materials  are  uniform,  and  the  motivation  of  the  subjects  is  adequate,  in¬ 
consistency  in  a  subject*  s  preferences  in  a  sensory  test  may  have  two  diffe¬ 
rent  explanations.  All  concentrations  of  a  critical  component  in  the  test 
materials  may  fall  below  the  subject’s  threshold  of  perception,  so  that  his 
choices  are  essentially  random.  Alternatively,  the  actual  difference  bei- 
tween  two  concentrations  of  a  critical  conponent  where  both  are  above  his 
threshold  may  not  exceed  the  just  perceptible  difference  between  them,so 
again  his  choice  is  random.  Although  both  factors  are  inportant,  I  shall 
consider  here  only  the  threshold  of  perception  and  its  distribution,  where 
I  have  been  struck  by  some  analogies  with  experiments  on  insecticides  and 
drugs. 


CONTINUOUS  DISTRIBUTIONS  OF  SENSORY  THRESHOLDS.  Let  us  suppose  that 
we  are  detennining  preferences  for  samples  of  orange  juice,  which  differ 
primarily  in  their  content  of  sugar  or  of  acid,  ¥e  will  further  assume 
that  the  differences  between  samples  are  greater  than  the  just.perceptible 
sensory  difference,  if  the  subject  can  detect  the  sugar  or  acid  at  all, 

A  subject  with  a  threshold  for  the  critical  component  that  falls  below 
that  of  the  sample  with  the  smallest  concentration  will  be  able  to  dis¬ 
tinguish  one  juice  from  another.  Whatever  may  be  the  level  of  sugar  or 
acid  which  he  prefers,  vrhether  high,  low  or  in  the  middle,  his  preferences 
in  replicate  tests  can  be  Internally  consistent  because  he  can  recognize 
the  differences  between  all  of  them.  By  contrast,  a  subject  with  a  sensory 
threshold  above  the  concentration  of  sugar  or  acid  in  all  or  most  of  the 
sample  juices  will  be  unable  to  separate  those  falling  below  his  threshold, 
and  will  rank  them  at  random  or  by  some  secondary  factor.  His  replicate 
ratings  will  be  internally  inconsistent.  Hence,  it  would  be  useftil  to 
know  how  the  sensory  thresholds  for  primary  taste  sensations  such  as  sweet, 
sour,  salt  and  bitter,  are  distributed  in  a  sample  of  tasters. 


2^2 


Design  of  Experiments 


Two  experiments  in  this  area  have  concerned  water  solutions  of  sucrose 
(Baker  et  al,  195U)  and  of  tartaric  acid  (Baker  et  al,  19^8),  Each  series 
of  five  and  ten  test  solutions  respectively  represents  a  geometric  pro¬ 
gression  in  steps  of  two,  so  large  a  log  interval  that  only  the  median 
threshold  can  be  estimated  profitably  for  each  subject.  Each  was  asked  to 
identify  the  tube  containing  the  test  solution  when  it  was  paired  against 
water,  and  he  sampled  each  concentration  in  his  critical  range  15  or  more 
times  over  a  nvuriber  of  sessions.  Concentrations  below  his  sensory  thres¬ 
hold  have  an  expectation  of  $0%  correct  answers,  which  increases  to  lOQ^ 
as  the  concentrations  reach  and  exceed  the  subject’s  threshold.  The  statis¬ 
tical  problem  is  two -fold:  (l)  how  can  we  conpute  the  median  threshold 
concentration  for  each  subject,  and  (2)  hovr  are  these  thresholds  distri¬ 
buted  in  the  population  from  i^ch  these  subjects  may  be  considered  a  random 
sample? 

In  animal  tests,  many  drugs  and  toxicants  produce  no  perceptible  change 
in  the  organism  until  they  reach  a  critical  concentration,  when  an  all-or- 
none  reaction  occurs.  The  critical  dose  ^ich  just  produces 'a  response  is 
a  measure  of  its  threshold  level  at  the  time  of  the  test,  although  in  re¬ 
peated  trials  with  the  same  dose  the  animal  may  react  on  some  occasions 
and  not  on  others,  as  has  been  reported,  for  exajple,  in  the  convulsive  re¬ 
sponse  of  individual  rats  to  the  dreg  thuj one  (Sampson  and  Fernandez,  1939), 
If  the  animal  is  tested  repeatedly  in,  say,  20  trials,  with  doses  ranging 
from  one  to  which  it  does  not  respond  to  one  to  which  it  responds  invariably, 
the  percentage  of  positive  reactions  plots  commonly  against  the  log-dose 
of  dreg  as  a  symmetrical  sigmoid  curve,  A  similar  relation  might  be  ex¬ 
pected  with  the  thresholds  for  a  taste  stimiiLus,  as  has  been  shown  for  su¬ 
crose  by  Gridgeman  (1958) . 

If  the  variation  in  the  threshold  results  from  a  niimber  of  factors 
acting  concurrently,  some  raising  and  others  lowering  the  level,  a  suitable 
model  would  interpret  the  sigmoid  relation  as  an  integrated  normal  curve. 

The  stimulus  interpolated  from  this  curve  at  a  net  response  of  5055  would 
estimate  the  individual’s  threshold  as  the  median  of  a  normal  dLstiibution, 
Our  simplest  procedure  is  to  convert  each  sigmoid  curve  to  a  linear  form 
by  plotting  the  standardized  normal  deviate  or  probit  for  each  observed 
net  percentage  against  the  log-concentration  of  the  stimulus.  The  normal 
curve  is  paralleled  so  closely  by  the  logistic  over  most  of  its  range  that 
substantially  the  same  result  can  be  obtained  with  either  function  (Gridge- 
man,  1958)* 


Taste  tests  have  one  complication,  which  is  iuplied  by  the  term  ’’net 
percentage  response".  When  the  subject  is  unable  to  discriminate,  we  start 
with  an  expectation. of  50  instead  of  0  percent  of  correct  answers.  This  is 
analogous  to  an  insecticide  experiment  with  $0%  natural  mortality,  where 
the  natural  mortality  and  that  attributable  to  the  toxicant  act  independently 
of  each  other.  In  a  taste  test  each  response  between  50  and  100^  of  correct 
identifications  can  be  corrected  for  a  base  line  of  50^  with  the  entomolo¬ 
gist' s  adjustment  for  natural  mortality  by  computing 


Net  %  response  = 


observed  ^  -  50 

1  -  o,5o 


Design  of  Ejqjeriments 


253 


VJhen  percentages  of  more  than  50^  are  observed  at  the  smaller  concentrations 
in  a  graded  series  and  are  not  succeeded  by  larger  values ,  they  may  exceed 
5(^  by  chance  and  can  be  omitted  as  not  relevant  to  the  e^eriment,  . 

For  analysis,  each  net  response  in  the  intenaediate  zone  between  0  and 
100  percent  is  transformed  to  its  empirical  probit  by  a  suitable  table,  such 
as  that  given  by  Fisher  and  lates  (1957).  Since  responses  of  0  and  100  per¬ 
cent  have  empirical  probits  of  minus  and  plus  infinity  respectively,  we  may 
adapt  Berkson’s  (1953)  useful  dodge  for  our  preliminary  estimate  and  replace 
the  first  net  zero  percent  below  the  intermediate  zone  and  the  first  100^ 
above  the  intemediate  zone  by  the  percentages  lOO/N  and  100  (N-l)/N  respec¬ 
tively,  where  N  is  the  total  niimber  of  pairs  sampled  by  a  single  subject 
at  a  given  concentration.  These  are  then  transformed  to  empirical  probits. 

Given  the  coded  log-concentrations  (x)  and  the  corresponding  en^iirical 
probits  (y),  we  may  compute  a  trial  straight  line  for  each  individual  by 
sin5)le  least  squares  without  weighting.  If  the  slopes  (b)  of  these  lines 
for  subgroups  of  two  or  more  subjects  agree  sufficiently,  they  can  be  com¬ 
bined  into  ingjroved,  more  stable  estimates  b^.  The  subjects  in  Baker's  ex¬ 
periments  were  grouped  primarily  by  the  number  of  responses  in  the  inter¬ 
mediate  zone,  those  tested  with  sucrose  into  two  homogeneous  groups  and 
those  with  tartaric  acid  into  three  groups,  with  significantly  different 
slopes.  Each  individual's  threshold  in  coded  log-units  may  be  estimated 
provisionally  from  the  unweighted  means,  x  and  y,  and  b^  as 

X .  -  X  (5-y)/b 
5  c 

where  is  the  coded  mean  of  a  log-normal  sample  with  an  estimated  standard 
deviation  s  =  l/b  ,  The  calculation  of  the  provisional  estimate  is  illustrated 
for  subjects  "G”  End  ‘’N*’  in  Table  2. 

For  a  definitive  result,  these  initial  estimates  are  iuroroved  itera¬ 
tively  by  maximum  likelihood.  This  involves  computing  the  expected  probit 
I  at  each  x,  replacing  the  eii5)iricax  probits  with  their  estimated  working 
probits  y,  and  weighting  each  y  by  w  =  N  {,Z^/Q(P*1)3  •  '  '^9™  in  brackets 

is  the  weighting  coefficient  for  50^  natural  mortality,  which  has  been 
tabeled  by  Finney  (1952).  The  first  weighted  regression  will  often  answer 
the  experimenter's  requirements.  Its  calculation  is  illustrated  in  Table 
3  with  the  data  for  the  two  subjects  in  Table  2,  Addition^  iterations 
have  been  computed  from  the  present  data,  omitting  from  any  individuals 
■with  less  than  two  responses  in  the  in'bermediate  zone. 

Agreement  with  our  model  can  be  checked  hy~)L^*  The  separate  slopes 
for  the  subjects  tes-ted  with  both  sucrose  and  tartaric  acid  showed  better 
than  average  agreement  -with  the  combined  slopes  for  their  respective  groups 
but  the  vatiation  of  y  about  several  of  the  curves  with  lesser  slopes  was 
significantly  he-berogeneous,  A  coji^josite  ^  ^  over  all  tests,  however, 
showed  adequate  agreement  with  the  underlying  hypo'thesis  {J.  =  82.77,  n  «» 

75)* 


2^h 


Design  of  Expeidments 


From  the  weighted  means  y  and  5c,  the  coded  threshold  concentration 
Xh  for  sugar  or  acid  was  recomputed  for  each  subject  with  the  relevant  com¬ 
bined  slope  bg.  To  test  vdiether  these  log-thresholds  were  distributed 
normally,  each  series  has  been  arranged  in  increasing  order  and  plotted  in 
Figure  1,  where  the  ordinate  is  the  corresponding  rankit  or  ejqsected  average 
deviate  for  a  san^jle  of  N  (*l5  and  2k)  from  a  normal  population  with  zero 
mean  and  unit  standard  deviation  (Fisher  and  Yates  1957 >  Table  XX)*  Since 
the  trend  of  each  series  of  plotted  points  id  substantially  linear^  we  may 
consider  the  distribution  of  these  two  taste  thresholds  as  essentially,  log- 
no  imal. 

The  distribution  of  thresholds  could  play  an  important  role  in  testing 
preferences  between  similar  foods  in  a  series.  If  the  concentration  of  a 
critical  component  in  the  samples  offered  to  a  panel  were  less  than  some 
of  their  thresholds,  it  could  influence  the  choice  of  only  those  individuals 
with  a  small  enough  threshold  to  taste  this  '  con^onent.  Hence,  inconsistencies 
in  a  subject’s  response  may  depend  upon  the  relation  of  his  sensory  thres¬ 
hold  for  the  critical  factor  to  that  of  the  population  whose  preferences  are 
being  sangjled* 

DISCONTINUOUS  DISTKEBUTIQNS  OF  THE  SENSORY  THBESHOLD,  The  variations 
of  sensory  thresholds  between'  individuals  may  not  be  continuous  and  sub¬ 
stantially  normal  or  Gaussian,  as  in  the  above  tests  with  sucrose  and  tar¬ 
taric  acid.  Instead  a  clear-cut  discontinuity  may  divide  the  population 
into  two  categories  of  tasters  and  nontasters.  Perhaps  the  best  known 
case  is  that  of  solutions  of  phenyl-thio -carbamide  (PTC),  which  to  some 
people  are  exceedingly  bitter  and  to  others  tasteless.  Tasters  can  be  se¬ 
parated  from  non-tasters  by  a  solution  of  about  l/5  molar.  In  a  study  of 
some  3700  individuals,  the  proportion  of  tasters  in  the  population  was  about 
71^  (Cotteman  and  Snyder,  1939) ,  Geneticists  have  traced  the  dichotomy 
to  a  single  pair  of  alleles.  Individuals  homozygous  for  the  recessive  gene 
find  PTC  tasteless  and  individuals  with  one  or  both  of  its  dominant  allele 
find  it  extremely  bitter.  Although  in  saturated  solutions  {k/$  molar)  aH 
individuals  can  detect  some  bitterness  (Blakeslee,  1932),  the  frequency  dis¬ 
tribution  of  the  taste  threshold  is  sharply  bimodal,  with  a  fair  spread  be¬ 
tween  tasters  and  little  or  no  spread  between  the  non-tasters, 

Blakeslee  and  his  associates  (1935»  19U8)  have  extended  their  studies 
to  olfactory  as  well  as  to  taste  reactions.  Individuals  varied  widely  not 
only  in  their  thresholds  but  also  in  their  preferences  as  to  whether  a  given 
odor  was  pleasant,  indifferent  or  unpleasant.  If  the  preference  for  one 
of  two  varieties  of  a  given  food,  for  example,  were  to  depend  upon  a  well- 
marked  bimodality  in  the  taste  or  olfactory  threshold,  the  situation  would 
parallel  that  separating  the  placebo  reactors  from  the  placebo  non-reactors 
in  experiments  with  drugs. 

In  a  clinical  experiment  that  is  especially  relevant,  four  prepara¬ 
tions  A  to  D  were  conpared  as  headache  remedies  (Jellinek,  19lt6),  Tablets 
of  the  four  preparations,  identical  in  color,  shape,  size  and  taste  but 
with  the  compositions  shown  in  Table  ]+,  were  distributed  to  headache- 
prone  patients  under  code  designations  concealed  from  both  the  patients 
and  their  cooperating  physicians.  They  were  given  in  successive  two -week 


Design  of  Experiments 


255 


periods  to  a  total  of  199  patients  in  an  order  determined  for  each  group  of 
k9  or  50  by  the  rows  in  a  Latin  square.  Each  patient  was  to  take  a  tablet 
every  time  he  developed  a  headache  and  to  record  whether  his  headache  was 
relieved  within  half  an  hour,  ^'flien  the  results  were  analyzed,  the  mean 
success  rates  for  the  four  preparations  were  A  0,8U,  B  0,80,  C  0*80, 
and  D  0.52,  the  first  three  giving  relief  significantly  more  often 
than  the  placebo  (D)  but  not  differing  among  themselves. 

Jellinek  then  examined  the  frequency  distribution  of  the  number  of 
successes  reported  for  the  placebo  and  found  it  U-shaped,  as  in  the  two 
series  in  the  Table  5  patients  reporting  five  headaches  in  a  two-week 
period.  On  the  hypothesis  that  patients  who  never  reacted  to  the  placebo 
had  physiological  headaches  and  those  who  were  relieved  by  the  placebo  on 
one  or  more  occasions  had  psychological  headaches,  he  divided  his  subjects 
into  two  series,  numbering  79  and  120  respectively.  Their  mean  success 
rates  and  the  combined  analysis  of  variance  for  each  series  are  given  in 
Table  6,  The  placebo  non-reactors  discriminated  between  the  three  true 
drugs  significantly  but  the  placebo  reactors  gave  all  three  the  same  success 
rate  as  the  placebo  itself  (0,86),  Jellinek  concluded  that  “discrimination 
among  remedies  for  pain  can  be  made  only  by  subjects  who  have  a  pain  on 
which  the  analgesic  can  be  tested”, 

SEPARATION  OF  SENSITIVE  AUD  INSENSITIVE  SUBJECTS,  How  can  this  prin¬ 
ciple  be  applied  to  the  testing  of  preferences  where  we  have  no  “placebo” 
to  separate  the  sheep  from  the  goats?  One  possible  criterion  is  consistency 
in  the  response  of  each  subject.  This  would  require  replication  within 
subjects,  so  that  the  mean  ratings  of  each  individual  can  be  tested  against 
an  error  term  based  upon  his  own  inconsistency.  The  direction  of  preference 
or  its  additivity  would  not  be  a  criterion,  only  the  requirement  that  a 
subject  designated  as  "having  preferences”  must  show  some  stability  in  at 
least  one  choice  in  replicated  tests.  In  parallel  analyses,  all  variances 
between  "treatments"  for  those  without  preferences  should  be  of  the  same 
magnitude  as  the  error,  but  for  those  with  preferences  one  or  more  compari¬ 
sons  should  be  significant  or  approach  significance. 

The  relative  size  of  the  two  greups  would  itself  be  an  iirportant  out¬ 
come  of  the  experiment  and  we  could  minimize  or  omit  any  preliminary  screen¬ 
ing  of  the  taste  panel.  Within  the  sensitive  group,  of  course,  individuals 
may  have  sufficiently  diverse  preferences, that  no  direct  comparisons  be¬ 
tween  treatments  are  significant,  but  in  this  case  a  significant  interaction 
of  trea-taient  by  subject  would  tesiafy  to  a  difference  in  opinion  as  contrasted 
with  no  opinion.  Whether  the  indifferent  subjects  represent  "taste  blind" 
individuals,  as  in  the  PTC  test,  or  the  high  threshold  end  of  a  Gaussian 
distribution  would  probably  require  additional  experiments.  In  the  latter 
case,  we  would  expect  the  proportion  of  non-tasters  to  be  relatively  unstable 
and  more  dependent  upon  the  exact  concentration  of  the  critical  component 
in  the  test  materials  than  if  the  distribution  of  sensory  thresholds  were 
bimodal. 


256 


Design  of  Experiments 


As  an  example,  I  will  apply  this  procedure  to  a  paired  comparison  on 
the  relative  palatability  of  Ck)rtland  apples  from  trees  which  had  been  treated 
with  four  different  spray  combinations  in  a  single  experimental  orchard  (Bliss 
et  al,  1956)  •  Each  spray  mixture  contained  one  of  the  two  fungicides,  tiiiram 
(Th)  or  sulphtir,  and  one  of  the  two  insecticides,  lead  arsenate  (L)  or  para- 
thion,  in  a  2x2  factorial  design.  Apples  from  each  treatment  were  chosen 
at  random  from  the  fall  harvest,  washed  in  detergent  suds,  rinsed,  cored 
and  quartered.  These  were  then  made  into  sauce  and  qtiick  frozen,  the  yield 
from  one-quarter  of  each  of  ten  apples  providing  sufficient  test  material 
for  a  given  treatment  in  a  single  taste  session. 

Twenty-five  subjects,  students  and  faculty  at  the  University  of  Connecti¬ 
cut,  participated  in  six  sessions  arranged  in  three  pairs,  the  second  session 
of  each  pair  following  two  days  after  the  first.  In  the  first,  third  and 
fifth  sessions,  the  six  possible  pairs  of  the  four  treatments,  all,  with  con¬ 
cealed  identities,  were  presented  to  each  taster  in  an  order  determined  by 
assigning  him  to  one  ixiw  in  a  6x6  Latin  square.  In  the  second,  fourth  and 
sixth  sessions,  he  tasted  the  same  sequence  of  pairs  as  in  the  preceding 
session  but  with  the  order  within  each  pair  reversed.  Each  subject  recorded 
not  only  his  preference  within  each  pair  but  also  whether  the  difference 
was  slight,  moderate,  large,  or  really  non-existent. 

The  degrees  of  preference  have  been  transformed  to  rahkits  for  H  =  7, 
and  analyzed  in  terms  of  Scheffe's  extension  of  the  Thurs  tone -Mo  s  teller 
model.  This  model  postulates  a  subjective  continuum  within  each  subject 
on  which  the  sensations  developed  by  the  stimuli  are  arranged  on  a  linear 
scale,  the  sensations  for  each  stimulus  varying  noraially  about  a  raid-point. 

The  six  rankits  in  each  of  the  six  replicated  tests  for  each  subject,  one 
for  each  pair  of  samples,  varied  about  zero  with  36  degrees  of  freedom. 

Six  of  these  represent  differences  between  the  means  of  the  paired  stimuli 
and  30  the  remaining  variability.  From  the  six  mean  differences  we  can 
isolate  three  factorial  comparisons,  each  with  one  degree  of  freedom,  repre¬ 
senting  the  preference  between  the  fungicides  thiram  and  sulphur,  between 
the  insecticides  lead  arsenate  and  parathion,  and  their  interaction.  The 
sum  of  squares  for  the  remaining  three  degrees  of  freedom  measures  non¬ 
additivity  on  the  subjective  continuum. 

Each  factorial  effect  for  each  subject,  disregarding  the  direction  of 
the  preference,  was  conpared  against  his  residual  variability.  In  11  of 
the  25,  no  effect  was  significant  at  P  4  0.l5,  with  P  4  0,20  for  only  five 
of  the  33  conqiarisons  and  no  two  of  these  in  the  same  individual.  These 
subjects  apparently  were  either  indifferent  or  insensitive  to  any  specific 
flavors  which  might  be  associated  with  the  four  toxicants.  Presumably, 
their  thresholds  lay  either  in  the  upper  end  of  a  nomally  distributed  popu¬ 
lation  of  taste  thresholds  or  in  the  upper  portion  of  a  bimodal  distribu¬ 
tion. 


Two  analyses  of  variance  have  been  computed  in  Table  7  for  11  incon¬ 
sistent  and  for  the  llv  consistent  subjects.  In  agreement  with  our  hypothesis, 
no  comparison  in  the  first  group  is  significant,  although  the  contrast  be¬ 
tween  thiram  and  sulphur  is  twice  its  error  variance  (F  =  2,22),  By  con¬ 
trast,  the  other  li;  subjects  differ  significantly  in  their  preferences  for 
the  two  direct  factorial  con^jarisons  (rows  5  and  6),  but  their  disagreement 


Design  of  Ebqjeriments 


257 


is  not  sufficient  to  preclude  a  significant  "vote"  for  -tMraia  in  preference 
to  sulphur  (P  <  0.025)  and  for  lead  arsenate  in  preference  to  parathion 
(P  <  0,05)  •  The  interaction  in  row  3,  measuring  the  dependence  of  the  pre¬ 
ference  for  the  insecticide  upon  which  fungicide  was  present  (or  vice-versa), 
approaches  significance.  The  assuir^ition  of  an  additive  scale  or  linear  sub¬ 
jective  continuum  is  justified  by  variance  ratios  in  rows  U  and  8  of  F  <  1 
or  not  significant.  In  coir5)arison  with  the  composite  analysis  in  the  origi¬ 
nal  paper,  the  present  subdivision  of  the  25  subjects  into  two  groups  has 
sharpened  our  test  of  the  disagreement  between  those  with  a  consistent  prefer 
ence.  Since  each  direct  effect  has  been  compared  against  its  interaction 
with  subjects,  its  overall  significance  is  somewhat  smaller  than  before. 

SUMMARY.  By  their  very  name,  preference  tests  measure  the  comparative 
response  of  individual  subjects  to  a  series  of  two  or  more  items,  most 
commonly  in  taste  tests.  When  these  can  be  presented  to  each  subject  in 
pairs,  scoidng  the  direction  and  degree  of  preference  between  the  two  samples 
provides  a  more  pertinent  criterion  than  rating  each  item  separately  on  a 
hedord.c  scale,  which  introduces  a  needless  source  of  variation.  When  the 
samples  do  not  differ  enough  qualitatively  to  make  ranking  difficult,  three 
or  more  items  may  be  presented  in  each  set  and  the  subject  asked  to  rank 
them  in  order  of  preference.  The  structure  and  order  of  presentation  with¬ 
in  each  set  may  represent  an  arrangement  in  randomized  groups,  Latin  squares, 
or  in  balanced  incomplete  blocks  such  as  the  louden  square. 

Statistical  methods  are  suggested  for  estimating  the  individual  sensory 
thresholds  for  a  given  stimulus  and  then  describing  their  distribution,.  Of 
other  designs  available  for  this  purpose,  the  experimenter  would  be  well 
advised  to  consider  sequential  procedures.  Experiments  for  measuring  thres¬ 
holds  are  primarily  of  value  in  explaining  the  results  of  preference  tests. 
Where  our  objective  is  descriptive  rather  than  explanatory,  the  separation 
of  subjects  by  the  consistency  of  their  replicated  responses  into  two  seiies, 
one  sensitive  and  the  other  indifferent,  should  prevent  the  individuals 
without  preferences  from  concealing  the  critical  evidence  of  those  with 
preferences  and  thereby  increase  the  efficiency  of  our  experiments.  The 
proportion  of  individuals  without  preferences  would  then  constitute  one 
out-come  of  the  experiment. 


Table  1»  A  7^7  Latin  square  that  divides  into  two  louden 
squares.  (Youden,  1940) 


Order  of 
tasting 

1 

Replicate  No. 

2  3  4  5  6 

7 

Letters  occur  together 
in  same  replicate 

1 

1 

A 

B 

C 

0 

E 

F 

G 

2 

2 

B 

C 

D 

£ 

F 

G 

A 

once 

3 

3 

B 

£ 

F 

G 

A 

B 

C 

4 

1 

C 

D 

£ 

F 

G 

A 

B 

5 

2 

£ 

F 

G 

A 

B 

C 

B 

6 

3 

F 

G 

A 

B 

C 

D 

£ 

twice 

7 

4 

0 

A 

B 

C 

D 

£ 

F 

Table  2.  Provisional  estimate  of  EC50  for  tasters  ”0”  and 
. "N"  from  paired  difference  tests  against  water  of  5  concentra¬ 
tions  of  sucrose  increasing  by  multiples  of  2  from  0.034^  at  x^l 
(Baker  et  al,  1954) 


Taster 

Soltn. 

X 

No .  (  +  ) 
total 

Net 

% 

Probit 

J 

Calculation  in  coded 
lo  g-concentrations : 

»c» 

1 

8/19 

0 

3.4 

X  =  2.5,  y  »  4.575 

2 

10/18 

11 

3.8 

[x^]  =  5,  [xy]  =  5.05 

3 

12/18 

33 

4*6 

Slope  for  b  =  1.01 

4 

15/15 

100 

6.5 

Pooled  slope  b^  =  1.120 

5 

15/15 

100 

Omit 

£050  -  2.879  (with  b^) 

‘•N« 

1 

10/19 

5 

3.4 

X  =  3,  y  *=  5.32 

2 

13/17 

53 

5.1 

tx']  =  10,  [xy]  =  7.2 

3 

16/19 

68 

5.5 

Slope  for  «N«,  b  =  0.72 

4 

14/15 

87 

6.1 

Pooled  slope  b^  =  0.725 

5 

15/15 

100 

6.5 

EC50  =  2.559  (with  b^,) 

PRECEDING  PAGE  BLANK 


261 


Table  3«  First  weighted  estimate  of  EC50  for  subjects  ”C*» 


and  "N”  in  Table  2,  where  Y  =  y  +  b  (x-x)  from  Table  2, 

__  c 


Taster 

Sol*n. 

X 

Expected 

Y 

Weight 
w  i 

Working 
probit  y 

wx 

wy 

ftC" 

1 

2.9 

.0 

2 

4.0 

1.1 

3.8 

2.2.. 

4.18 

3 

5.1 

4.0 

4.6 

12.0 

18.40 

4 

6.2 

2.6 

6.8 

10.4 

17.68 

5 

7.4 

.5 

7.8 

2.5 

3.90 

fiNfi 

1 

3.9 

.9 

3.5 

.9 

3.15 

2 

4*6 

2.6 

5.1 

5.2 

13.26 

3 

5.3 

4.5 

5.5 

13.5 

24.75 

4 

6.0 

3.0 

6.1 

12.0 

18.30 

5 

6.6 

1.3 

7.3 

6.5 

9.49 

Statistic 

«C“ 

‘»N« 

Statistic 

»fC« 

«N»» 

L\r 

8.2 

12.3 

[wxyj 

7.837 

10.993 

2(vx) 

27.1 

38.1 

b 

1.587 

.770 

E(wy) 

44.16 

68.95 

b’c 

1.670 

.840 

X 

3.305 

3.098 

[wy?3 

13.350 

9.171 

7 

5.385 

5.606 

B8 

12.437 

8.461 

[wx®]  ' 

4.938 

14.283 

1 .  ^5 . . 

.913 

!  3.074 

1 

.710 

2.377 

263. 


preceding  HAGt  dlm,,.. 

Table  4«  Clinical  test  on  the  comparative  effectiveness 
of  headache  remedies  in  patients  assigned  in  equal  numbers  to 
four  groups.  (Jellinek,  1946). 


Experimental  design  Drug  composition 


Group 

Successive 

2-week 

periods 

A  -  Ingredients  a,b,c 

No . 

1 

2 

3 

4 

B  -  »*  a,c 

I 

A 

B 

C 

D 

C  -  ♦*  a,b 

II 

B 

A 

D 

C 

D  -  Placebo  (pharma- 

III 

C 

D 

A 

B 

cologically  inert) 

IV 

D 

C 

B 

A 

Table  5.  Frequency  distribution  of  "successes'*  with  placeb^^ 
as  reported  by  sub  jects  who  had  taken  drug  D  for  5  attacks  of 
headache. 


No.  of 

headaches 

relieved 

0 

1 

2 

3 

4 

5 

No.  of 

sub jectSy 

this  study 

22 

1 

5 

7 

8 

16 

It  It 

” 

later  study 

27 

0 

1 

5 

10 

19 

Table  6.  Combined  analysis  of  success  rates  for  the  four 


groups  in  Table  4* 


Subjects 

Placebo  non-reactors 

1 - - 

Placebo 

1 

reactors 

No.  of 

Rates  for  drug 

No.  of 

Rates 

for  drug 

subjects 

ABC 

subjects 

A 

B  C 

ne an  rates 

79 

.88  .67  .77 

120 

.82 

,87  .82 

Term 

DF 

MS 

F 

DF 

MS 

F 

Subjects 

78 

.181 

2.08 

119 

jMHjj 

1.86 

Drugs 

2 

.999 

11.48 

2 

1,14 

Sub jectsKdrugs 

156 

,087 

238 

WSm 

265 


Table  7»  Analysis  of  variance  of  a  2x2  factorial  experiment 
on  off-flavor  in  apples  sprayed  with  thiram  (Th)  or  sulfur  and 
with  lead  arsenate  (L)  or  parathion;  from  paired  comparisons  with 
degrees  of  preference  transformed  to  rankits.  (Bliss  et  al,  1956) 


Row 

No. 

Comparison  of 

Inconsistent 
DF  MS 

subjects 

F 

Consistent 
DF  MS 

subjects 

F 

1 

Fungicides  (Th) 

1 

.545 

2.22 

1 

8.417 

7.42* 

2 

Insecticides  (L) 

1 

.239 

.97 

1 

3.740 

4.91* 

3 

Interaction  ThxL 

1 

.004 

.02 

1 

.681 

3.07 

4 

Non-additivity 

3 

.168 

.68 

3 

.020 

.09 

5 

Tasters  x  Th 

10 

.193 

.78 

13 

1.134 

5.10*’ 

6 

•»  X  L 

10 

.167 

.68 

13 

.761 

3.43*’ 

7 

»»  X  ThxL 

10 

00 

. 

1.16 

13 

.132 

.59 

8 

»»  X  non- add.. 

30 

.345 

1.40 

39 

.164 

.74 

9 

Within  tasters 

330 

.246 

420 

.222 

*  P  <0.05,  <0.001 


Log  -  Threshold  Sucrose  -  X, 


0  Z  4  6 

Log  —  Threshold  Add  ^  X- 


Figure  1.  Qraphic  test  for  agreement  with  a  normal  distri¬ 
bution  of  the  coded  threshold  concentrations  in  15  subjects 
for  sucrose  and  in  24  subjects  for  tartaric  acid.  Successive 
concentrations  increase  by  doubling  from  1  =  0.034^  of  sucrose 
and  1  =  586x10”  moles  per  liter  of  tartaric  acid;  at  rankit 
0,  the  straight  lines  pass  through  means  of  =  3.210  for 
sucrose  and  X^  =  3.834  for  tartaric  acid  with  slopes  of  b  «  l/s, 

where  the  standard  deviation  s  =  0.63-5  and  li651  respectively. 

PRECEDING  PAGE  BLANK 


LITERAIURE  CITED 


269 

1*  Amiltage,  P*  Restricted  sequential  procedures*  Biometrika,  9-26 

(1957). 

2,  Baker,  G^A,,  Amerine,  M,A,  and  Roessler,  E,B,  Errors  of  the  second 

kind  in  organoleptic  difference  testing.  Food  Research,  19,  206- 

210  (195W. 


3*  Baker,  G»A.,  Mrak,  V,,  and  Amerine,  M.A,  Errors  of  the  second  kind 
in  an  acid  threshold  test.  Food  Research,  23j  l^O-l^il  (1958)* 

U,  Bennett,  G,,  Spahr,  B.M*  and  Dodds,  M,L,  The  value  of  training  a 
sensory  test  panel.  Food  Technology,  W,  20^-208  (19^6). 

5.  Berkson,  J,  A  statistically  precise  and  relatively  simple  method  of 

estimating  the  bio-assay  with  quantal  response  based  on  the  logis¬ 
tic  function.  J,  Am.  Stat,  Assoc,,  1^,  565-^99  (19^3). 

6.  Blakeslee,  A.F,  Genetics  of  sensory  thresholds:  taste  for  phenyl 

thio  carbamide,  Proc.  Nat,  Acad,  Sci.,  1^,  120-130  (1932). 

7.  Blakeslee,  A.F.  and  Salmon,  T.N.  Genetics  of  sensory  thresholds:  in¬ 

dividual  taste  reactions  for  different  substances.  Proc,  Nat. 

Acad,  Sciences,  21,  78-90  (193?). 

8.  Bliss,  C.I,,  Anderson,  S.O,  and  Marl  and,  R.E,  A  technique  for  testing 

consumer  preferences,  with  special  reference  to  the  constituents 
of  ice  cream,  Storrs  Agr,  Expt.  Sta,,  Bull,  251,  1-20  (191:3). 

9.  Bliss,  C.I,,  Greenwood,  M.L,  and  McKenrick,  M,H,  A  comparison  of  scor¬ 

ing  methods  for  taste  tests  with  mealiness  of  potatoes.  Food  Tech¬ 
nology,  h  U91-U95  (1953)* 

10,  Bliss,  C.I,,  Greenwood,  M.L,  and  White,  E,S,  A  rankit  analysis  of 

paired  comparisons  for  raeasiiring  the  effect  of  sprays  on  flavor. 
Biometrics,  3^,  381-1:03  (1956), 

11,  Bradley,  R.A.  Some  statistical  methods  in  taste  testing  and  quality 

evaluation.  Biometrics,  9,  22-38  (1953)* 

12,  Campbell,  ¥,I.P,  and  Blakeslee,  A.F,  Would  a  rose  smell  so  sweet  by 

any  other  name?  Horticulture,  2^,  333  (191:8), 

13,  Carroll,  M,B,  Consumer  product  testing  statistics.  Flavor  Research 

and  Food  Acceptance,  Reinhold  Publ,  Corp,,  New  York,  162-171:  (1958) 


ll:,  Cotteman,  C.V/,  and  Snyder,  LJI,  Tests  of  simple  Msndelian  inheritance 
in  randomly  collected  data  of  one  and  two  generations,  J,  Am,  Stat 
Assoc.,  511-523  (1939). 


15,  Dykstra,  0.  A  note  on  the  rank  analysis  of  incomplete  block  designs- 
applications  beyond  the  scope  of  existing  tables.  Biometrics,  12, 

301-306  (1956). 

PRECEDING  PAGE  BLANK 


Design  of  Experiments 


16,  Fenton,  F.E,  Judging  and  scoring  milk.  Farmer*  s  Bull.  2111,  1-20  (1957). 

17,  Finney,  D.J,  Probit  Analysis.  2nd.  Ed.  Univ,  Press,  Cambridge,  (1952). 

18,  Fisher,  R.A.  and  Yates,  F.  Statistical  Tables  for  Bio-logical,  Agri¬ 

cultural  and  Medical  Research.  Oliver  and  Boyd,  Edinburgh  (1957 )  * 

19,  Greenwood,  M.L,  and  Salerno,  R.  Palatability  of  kale  in  relation  to 

cooking  procedure  and  variety.  Food  Research,  ll4.,  31^^-319  (I9l4-9i« 

20,  Gridgeman,  N.T.  Taste  comparisons:  two  samples  or  three?  Food  Tech¬ 

nology,  9f  1U8-150  (1955). 

21,  Gridgeman,  N.T,  A  tasting  experiment.  Applied  Statistics,  5j  106-112 

(1956). 

22,  Gridgeman,  N.T,  Application  of  quantal  response  theory  to  the  cross- 

comparison  of  taste-stimuli  intensities.  Biometrics,  5U8-557 
(1958). 

23,  Harries,  J.M,  Sensory  tests  and  consumer  acceptance.  J,  Sci,  Food, 

Agric.,  l|.77-ii82  (1953)* 

2k*  Helgren,  F.J.,  Lynch,  M.J.  and  Kirchraeyer,  F.J,  A  taste  study  of  the 
saccharin  "off-taste” ,  J.  Am*  Pharm,  Assoc*,  UU,  353-355  (1955)* 

25,  Hopkins,  J.W,  Incomplete  block  rank  analysis:  some  taste  test  results. 

Biometrics,  10,  391-399  (l95ii). 

26,  Jackson,  J.E.  and  Fleckenstein,  M.  An  evaluation  of  some  statistical 

techniques  used  in  the  analysis  of  paired  con^jarison  data.  Bio¬ 
metrics,  13 i  51-6U  (1957)* 

27,  Jellinek,  E.M.  Clinical  tests  on  comparative  effectiveness  of  analgesic 

drugs.  Biometrics  Dull,,  87-100  (19^4-6). 

28,  Mackey,  A.O,  and  Jones,  P,  Discernment  of  primary  tastes  in  water 

solution  compared  with  Judging  ability  for  foods.  Food  Tech¬ 
nology,  8,  527-530  (195U). 

29,  Murphy,  E.F.,  Clark,  B.S,  and  BergLund,  R.M,  A  consumer  survey  versus 

panel  testing  for  acceptance  evaluation  of  Maine  sardines.  Food 
Technology,  12,  222-226  Cl958)« 

30,  Morphy,  E.F.,  Covell,  M.R,  and  Dinsraore,  J.S.,  Jr,  An  examination  of 

three  methods  for  testing  palatability  as  illustrated  by  straw¬ 
berry  flavor  differences.  Food  Research,  22,  k^3-h39  (1957)» 

31,  Pangbom,  R.M.,  Simone,  M.  and  Nickerson,  T.A.  The  influence  of  sugar 

in  ice  cream,  I  Consumer  preferences  for  vanilla  ice  cream. 

Food  Technology,  9,  679-682,  (1957)* 


Design  of  Experiments 


271 


32,  Peryam,  D.R,  and  Haynes,  J.G,  Prediction  of  soldiers*  food  preferences 

by  laboratory  meiiiods,  J.  Appl.  Psych,,  Ijl,  2-6  (1957)* 

33,  Reid,  A,¥.  and  Becker,  C.H.  A  study  of  cocoa  syrups  for  taste  preference. 

J,  Am,  Phann,  Assoc,,  Sci.  Ed,,  1^,  I6O-I62  (1956), 

3U«  Sampson,  ¥,L.  and  Femandex,  L,  Experimental  convulsions  in  the  rat, 

J,  Pharmacol,  Exptal,  Therapeutics,  65,  275-280  (1939) » 

35.  Scheffe,  H,  An  analysis  of  variance  for  paired  comparisons,  J,  Am, 

Stat,  Assoc,,  1+7^  381-liOO  (1952), 

36,  louden,  ¥,J,  Experimental  designs  to  increase  accuracy  of  greenhouse 

studies,  Contrib,  Boyce  Thonqpson  Inst,,  11,  219-228(19140), 


SOJATISTICS  IN  THE  TEXTILE  INDUSTRY 
L.H.C.  Tippett 


^13 


British  Cotton  Industry  Research  Association 
Shirley  Institute »  Manchester,  England, 

Statistics  is  now  so  widely  used  in  many  industries  and  the  usages  are 
so  coaunonly  understood,  that  it  seems  better  for  me  to  ’’highlight”  a  few 
special  features  of  the  usage  of  statistics  in  the  textile  industry  than  to 
attempt  a  wearisome,  comprehensive  list  of  the  applications.  In  doing  this 
I  shall  deal  with  applications  that  have  come  under  my  notice  in  my  work 
xvith  the  Sritish  cotton  industry,  but  I  acknowledge  that  people  working  in 
the  cotton  and  other  branches  of  the  textile  industry  in  several  countries 
could  add  much  of  interest  to  the  subject. 

My  paper  is  divided  into  two  parts.  Part  I  mentions  briefly  a  few 
general  points  and  Part  II  deals  more  fully  with  the  design  of  experiments. 

Part  I#  General 


I  suppose  that  if  one  were  to  say  in  a  word  what  statistics  is  about 
(or  the  sort  of  statistics  under  consideration  at  this  conference)  one  would 
say  that  it  deals  with  variation  -  with  its  description  and  measurement,  and 
with  its  effects  on  scientific  inference  and  decision.  In  many  fields  -  in 
the  manufacture  of  engineering  components  to  a  dimension,  for  example  -  vari¬ 
ation  is  little  more  than  a  nuisance,  but  in  textiles  it  is  so  important  an 
attribute  of  the  quantities  of  technical  interest,  that  it  is  studied  in  its 
own  right.  Moreover,  not  only  is  the  degree  of  variation  important,  but  its 
pattern  is  also. 

For  example,  cotton  yarns  vary  in  mass  per  unit  length  or  thickness 
along  their  length  and  this  variation  can  take  the  form  of  a  mixture  of  ran¬ 
dom  fluctuations,  of  almoot-periodic  fluctuations  (with  a  period-length 
phase  and  amplitude  that  varies  from  place  to  place)  emd  of  strictly  periodic 
fluctuations.  The  almost-periodic  fluctuations  eu:e  due  largely  to  the  vari¬ 
ation  in  the  length  of  the  fibres  and  are  inherent  in  cotton  spinning.  The 
strictly  periodic  fluctuations  are  usually  caused  by  machinery  defects  such 
as  eccentric  rollers  or  faulty  gears,  and  can  be  eliminated.  When  yam  is 
used,  say,  as  weft  or  filling  in  a  cloth  or  knitted  into  hosiery,  periodic 
variations  even  of  slight  degree  can  form  a  pattern  that  is  unpleasing  to  the 
eye,  whereas  random  or  almost-periodic  variations  of  greater  degree  would  in 
the  same  circumstances  be  harmless. 

Thus,  in  the  analysis  of  the  causes  and  effects  of  yarn  thickness 
variation,  accoiint  must  be  taken  of  the  pattern.  In  the  statistical  treat¬ 
ment  a  number  of  devices  are  used;  periodograms,  correlograras,  and  curves 
relating  the  variance  of  mass  per  unit  length  to  the  specimen-length.  But 
of  these,  only  some  form  of  periodogram  analysis  leading  to  the  identifica¬ 
tion  of  period  lengths  and  amplitudes  has,  as  far  as  I  know,  led  to  conclu¬ 
sions  of  technical  importance.  Electronic  devices  are  nov/  available  for 
measuring  the  degree  of  variation  and  for  identifying  periods,  and  these 
are  used  in  mills  for  appraising  yam  quality  and  diagnosing  the  causes  of 

PRECEDING  PAGE  BLANK 


274 


Design  of  Experiments 

Samples  are  much  used  in  textiles,  and  there  are  tv;o  special  features 
to  which  I  call  attention »  !Che  first  is  that  in  taking  samples  of  fibres 
account  must  be  talten  of  a  bias  towards  selecting  long  fibres,  either  by 
adopting  a  technique  that  eliminates  such  bias  (as  is  done  for  cotton)  or 
by  calculating  the  bias  for  different  modes  of  selection  (as  is  done  for 
wool) .  The  second  feature  is  that  hardly  any  textile  appraisals  are  by 
attributes  (involving  classification  into  defectives  and  non-defectives); 
almost  all  are  by  measured  variables.  Most  of  the  statistical  literature 
on  industrial  sampling  and  most  of  the  sampling  plans  apply  to  attributes, 
and  so  have  little  application  to  textiles. 

Many  experiments  eire  done  in  the  textile  iidustry,  in  research  depart¬ 
ments  and  institutes  and  in  mills,  sometimes  in  order  to  increase  technolog¬ 
ical  knowledge  for  general  application  and  sometimes  in  order  to  provide 
information  on  the  best  conditions  of  processing  for  some  particular  situ¬ 
ation,.  There  is  much  uncontrolled,  and  uncontrollable,  variation  in  textile 
processing,  and  so  the  statistical  design  of  experiments  finds  important 
application.  Since  that  is  the  main  subject  of  this  conference,  I  shall 
devote  the  remainder  of  this  paper  to  it. 


Part  II.  The  Design  of  Experiments 

Although  most  space  in  textbooks  on  the  design  of  experiments  is 
devoted  to  statistical  aspects,  it  is  soon  discovered  by  the  praotitioner 
and  is  widely  tmderstood  that  the  whole  situation  has  to  be  taken  into 
account,  technical  as  well  as  statistical.  Dr,  G,E»P,  Box  and  his  col¬ 
leagues  of  the  Statistical  Techniques  Eesearch  Group  at  Princeton  have 
begun  to  study  the  wider  aspects  of  experimentation  systematically,  and 
I  ara  sure  that  in  the  years  to  come  we  shall  see  important  progress.  At 
present,  however,  the  experimenter  has  to  rely  largely  on  unsystematized 
experience  and  common  sense.  This  is  the  sort  of  situation  in  which  case- 
histories  are  especially  useful.  Accordingly,  I  propose  to  discuss  in 
some  detail  a  field  of  experimentation  in  textiles,  and  shall  try  to  bring 
out  the  general  issues  lying  behind  some  of  the  particular  considerations 
involved, 

I  shall  discuss  experiments  to  investigate  the  sizing  of  warp  yarns 
for  weaving.  The  warp  threads  are  those  that  extend  lengthways  in  a  long 
piece  of  cloth;  the  process  of  weaving  consists  in  interlacing  with  them 
the  cross-threads,  known  as  the  ”weft'’  or  the  "filling”.  During  weaving, 
the  warp  is  subjected  to  a  good  deal  of  abrasion  and  to  considerable 
fluctuating  tensions,  and  in  order  that  it  may  withstand  this  rough  treat¬ 
ment  it  is  "sized”  -  i«e,  the  yarn  is  given  a  protective  coating  of  some 
adhesive  such  as  (for  cotton)  starch  mixed  with  other  ingredients.  The 
subject  of  experimentation  is  the  determination  of  the  most  suitable  in¬ 
gredients  for  the  size  and  of  the  optimum  amount  to  be  put  on  the  warp. 

This  subject  has  been  investigated  for  many  years,  and  a  good  deal  is 
known  about  the  sizing  of  the  older  fibres  such  as  cotton  and  rayon  with 
natural  sizes  (starches,  gums,  and  gelatine);  but  the  coming  of  the  new 
synthetic  fibres  and  adhesives  has  given  the  subject  a  new  lease  of  life 
as  one  for  investigation.  Attempts  have  been  made  over  the  years  to 
elucidate  the  fundamentals  of  sizing  cuid  weaving,  and  to  develop  labor¬ 
atory  tests;  but  the  problem  has  proved  to  be  intractable  and,  although 


Design  of  Experiments 


275 


some  progress  has  been  made,  practicail  action  requires  the  information 
derived  from  empirical  experiments,  conducted  in  a  research  institute  or 
the  mill,  in  which  yarns  are  sized  in  different  ways  and  their  weaving 
performance  observed.  I  shall  discuss  fully  the  topic  of  performance, 
but  for  the  present  shall  characterise it  as  the  warp  breakage  rate.  From 
time  to  time  during  weaving  the  warp  threads  break  and  have  to  be  mended. 
The  warp  brealcage  rate  is  important,  and  a  low  rate  is,  of  course,  to  be 
desired. 

What  Factors? 

In  designing  an  experiment,  the  first  thing  to  be  decided  is;  what 
variables,  or,  in  the  jargon,  what  factors  shall  be  investigated.  The 
technologists  first  answer  will  undoubtedly  be  the  type  of  size  and  the 
amount  on  the  warp.  The  type  of  size  is  not  a  simple  thing  since  there 
are  usually  at  least  two  ingredients;  an  adhesive  such  as  starch  or 
polyvinyl  alcohol  (PVA)  eind  a  lubricant  such  as  tallow*  On  consideration, 
however,  the  technologist  will  agree  that  there  are  other  factors  that 
have  an  effect  on  weaving  performance  and  that  should  be  considered.  The 
relative  humidity  of  the  atmosphere  in  which  the  weaving  is  done,  the 
complex  of  factors  under  the  heading  of  loom  settings,  and  the  cloth  parti¬ 
culars  (which  may  range  from  those  for  a  fine  cambric  or  poplin  or  dress 
fabric  to  those  for  a  coarse  sheeting)  are  only  a  few* 

According  to  the  classical  method  of  experimentation  one  would 
investigate  each  of  these  factors,  one  at  a  time;  but  that  is  not  good 
enough.  The  optimum  amount  of  size  is  very  different  for  one  based  on 
starch  than  for  one  based  on  (say)  carob  bean  gum;  the  effect  of  relative 
humidity  is  not  the  same  for  all  sizes,  yarns,  and  cloth  constructions; 
and  so  on.  In  statistical  language  there  are  interactions  between  the 
factors,  and  for  complete  inforaation  all  relevant  factors  must  be  inves¬ 
tigated  in  a  so-called  factorial  experiment.  The  issue  of  the  factoidal 
versus  the  classical  experiment  had  once  to  be  argued;  now  it  is  decided 
and  factorial  experiments  are  genersilly  regarded  as  the  correct  thing. 

But  in  practice  difficulties  arise.  The  number  of  factors  can  be 
very  large  and  if  they  are  all  included  the  experiment  may  become  unman¬ 
ageably  large.  In  the  book  Design  and  Analysis  of  Industrial  Experiments, 
edited  by  Dr.  0*L.  Davies,  experiments  in  the  chemical  field  with  as  many 
as  five  factors  are  described,  but  such  a  scale  of  operation  would  be 
impracticable  in  the  field  I  am  dealing  with,  and  a  selection  of  factors 
has  to  be  made.  I  suspect  that  more  often  than  not  it  is  possible  to  think 
up  more  factors  than  can  be  dealt  with  in  one  experiment. 

In  sizing,  the  main  ingredient  of  the  size  is  the  adhesive,  and  the 
technologist  will  usually  be  able  from  his  general  knowledge  and  the  results 
of  laboratory  work  to  decide  what  other  ingredients  can  reasonably  be  incor¬ 
porated,  and  in  what  proportions.  In  this  way,  type  of  size  as  a  factor 
can  be  i*educed  to  the  adhesive,  although  the  situation  once  more  becomes 
more  complicated  if  tv;o  adhesives  are  used.  The  simultaneous  inclusion  of 
type  of  size  (simplified  in  the  v/ay  described)  and  the  amount  on  the  warp 
cannot  be  avoided  since  their  interaction  is  very  important. 


276 


Design  of  Experiments 


These  two  factors  result  in  an  experiment  that  is  as  large  as  can 
usually  be  handled  at  one  time  and  so  the  other  factors  are  usually  ex¬ 
cluded.  Relative  humidity  and  loom  settings  are  troublesome  to  vary  and 
most  technologists  will  be  prepared  to  act  on  the  assumption  that  their 
influence  on  the  optimum  type  and  amount  of  size  is  of  a  second  order  of 
importance.  If  the  investigation  is  done  in  a  mill,  the  management  v/ill 
be  interested  in  one  cloth  at  a  time,  and  so  cloth  particulars  can  be  ex¬ 
cluded  as  a  variable  factor.  A  research  institute  serving  an  industry  is 
interested  in  a  wide  range  of  cloths,  but  will  prefer  to  cover  the  range 
by  dealing  with  a  limited  number  of  typical  cloths,  and  finds  it  accept¬ 
able  as  well  as  convenient  to  have  a  separate  experiment  for  each  one. 
Then,  as  results  for  each  cloth  are  obtained,  manufacturers  weaving  that 
cloth  or  something  near  it  can  immediately  apply  them;  and  as  the  results 
for  different  cloths  accumulate,  a  pattern  begins  to  emerge  so  that  the 
whole  picture  can  be  filled  out  without  exhaustive  investigation. 


How  many  Levels? 

A  second  question  to  be  decided  is  how  many  values  of  each  factor 
there  should  be  in  the  experiment  -  in  statistical  language,  how  many 
levels  there  should  be,  A  related  question  is  how  they  should  be  dis¬ 
posed.  For  exanqile,  we  might  in  one  experiment  have  two  types  of  size 
Ci.e,  two  levels  of  type)  each  at  four  amounts  on  the  warp,  (i.e.  four 
levels  of  amount).  This  would  give  eight  variations  (termed  treatments) 
in  all. 

When  the  factor  is  qualitative,  as  is  the  type  of  size,  there  is 
little  to  say  about  the  chioce  of  levels  except  that  it  is  the  job  of  the 
technologist.  When  the  factor  is  a  measureable  variable,  two  levels  are 
enough  provided  there  are  grounds  for  believing  that  the  relationship 
between  the  measured  effect  of  the  factor  (termed  generally  the  response 
and  exemplified  here  by  the  warp  breakage  rate)  and  the  level  is  nearly 
linear  over  the  range  of  interest,  or  at  least  that  there  is  no  maximum 
or  minimum  in  the  curve.  Many  factorial  experiments  are  done  with  two 
levels  of  each  factor,  and  such  seem  to  be  very  suitable  for  exploring  a 
relatively  unknown  field  in  order  to  discover  which  factors  are  important. 

In  our  sizing-weaving  experiment,  however,  we  have  to  take  account 
of  the  fact  that  the  breakage  rate-amount  of  size  curve  usually  has  a 
minimum,  and  it  is  the  breakage  rate  around  this  minimum  that  we  require 
to  know.  Moreover,  it  is  not  quite  enough  to  know  exactly  the  minimum  - 
we  need  to  know  the  shape  of  the  curve.  For  example,  the  curve  might  be 
like  that  in  Fig.l*  rising  more  steeply  on  one  side  of  the  minimum  than 
the  other,  and  it  would  then  be  important  to  know  this.  In  ordinary  mill 
practice  the  amount  of  size  cannot  be  controlled  precisely,  and  in  routine 
production  one  would  aim  at  a  percentage  of  size  somewhat  in  excess  of  A, 
so  that  a  small  deviation  below  the  aimed-at  value  would  not  lead  to  the 
large  increase  in  response  that  would  result  fran  a  small  deviation  below 
the  value  A. 


Figtires  are  to  be  found  at  the  end  of  this  article 


277 


Design  of  Experiments 

In  order  to  obtain  the  information  required,  at  least  four  levels  of 
amount  of  size  are  necessary.  For  comparing  two  types  of  size,  one  would 
require  at  least  eight  treatments  -  sometimes  a  formidable  requirement. 

If  the  two  types  of  size  are  similar  in  general  character,  say  two  starches, 
one  might  be  able  to  reduce  the  number  of  treatments  to  (say)  six  by  assuming 
the  same  general  shape  of  curve  for  response  to  amount  for  each  t3rpe.  Then, 

I  think  that  most  technologists  would  use  four  treatments  to  establish  the 
curve  for  one  type  and  two  to  establish  the  general  level  of  response  for 
the  other.  If  one  had  enough  confidence  in  the  identity  of  shape  of  the  two 
curves  I  suppose  that  the  most  efficient  arrangement  would  involve  three 
treatments  for  each  t3q)e,  but  I  doubt  if  any  technologist  would  have  such 
confidence. 

If  the  types  of  size  are  very  different,  I  think  that  most  technologists, 
if  they  could  not  have  eight  treatments  in  one  experiment ,  would  want  to 
conduct  three  experiments,  each  with  perhaps  four  treatments.  The  first 
experiment  would  explore  the  breakage  rate-amount  curve  for  one  t3rpe,  the 
second  for  the  other,  and  the  third  would  establish  the  relationship  between 
the  t3pes  at  two  amounts  chosen  after  the  separate  curves  have  been  estab~ 
lished.  (I  am  presuming  that  the  average  level  of  response  for  each  exper¬ 
iment  is  uncontrollably  different  so  that  the  results  of  the  three  cannot 
be  compared  directly.) 

Someone  will  be  sure  to  say  that  such  a  set  of  experiments  is  too 
expensive  or  troublesome.  What  we  do  then  depends  on  the  circumstances. 

If  an  adequate  experiment  really  is  impracticable,  simplifying  assumptions 
(or  even  guesses)  may  have  to  be  made,  and  the  resulting  information  may 
be  better  than  nothing.  Or  the  information  from  an  inadequate  experiment 
may  be  so  slight  as  not  to  be  worth  the  cost  of  the  experiment.  The  last 
thing  we  should  do  is  to  allow  anyone  to  be  deluded  into  thinking  that 
adequate  information  can  be  derived  from  an  inadequate  experiment. 

Sometimes,  there  is  no  great  pressure  to  reduce  the  number  of  treat¬ 
ments,  and  hence  of  levels,  for  each  factor <  can  there  be  too  many?  For 
the  purpose  of  this  discussion  let  us  suppose  that  we  wish  to  explore  the 
breakage  rate-amount  of  size  curve  for  one  type  of  size,  that  its  shape  is 
roughly  that  of  Fig.l,  and  that  eight  or  ten  treatments  in  the  experiment 
are  tolerable.  Is  it  better  to  have  eight  or  ten  different  amounts  of 
size  or  to  have  four  or  five  different  amounts,  measuring  the  response 
for  each  amount  twice?  This  kind  of  question  requires  more  discussion 
than  I  can  give  here,  and  I  will  venture  an  opinion,  (for  which  I  would 
not  go  to  the  stake)  which  is  that  in  most  situations  it  is  best  to  have 
the  minimum  number  of  points  necessary  to  delineate  the  broad  outline  of 
the  curve,  and  correspondingly  to  have  the  maximum  information  for  each 
point  that  experimental  resources  permit. 


What  Levels? 

A  decision  has  to  be  taken  as  to  the  range  over  which  the  experimental 
variables,  the  factors,  shall  be  varied.  If  there  are  only  two  levels  for 
any  factor,  and  if  the  response  curve  is  linear,  the  more  vjldely  the  levels 
are  spaced  the  more  precisely  is  the  slope  of  the  response  curve  determined*, 
but  also,  the  more  likely  is  the  assumption  of  linearity  to  be  seriously  in 


278 


Design  of  Experiments 


error.  This  statement,  I  think,  could  be  extended.  If  the  mathematical 
form  of  the  response  curve  is  known,  I  suspect  that  for  many  forms  the 
more  widely  the  levels  are  spaced  the  more  precisely  is  the  curve  deter¬ 
mined  for  a  given  experimental  error.  But  seldom  is  the  form  of  the 
response  curve  known;  seldom  is  it  of  a  simple  mathematical  form.  Tech¬ 
nically  we  are  only  interested  in  the  response  curve  in  a  certain  region. 
The  phenomena  behind  the  curve  are  complex  and  I  doubt  if  infomation  for 
areas  far  outside  the  region  of  interest  gives  much  information  for  areas 
within  that  region,  Vnien  the  curve  is  like  that  of  Fig.l,  for  example, 
very  low  amounts  of  size  lead  to  catastrophic  results  and  are  to  be 
avoided;  quite  high  amounts  of  size  are  tolerable,  but  even  so  they  may 
not  be  advisable.  For  example,  the  response  curves  for  two  sizes  I  and 
II  may  be  as  shown  in  Fig.  2.  Practically  we  would  be  interested  in  the 
region  between  A  and  B,  ana  I  doubt  if  results  at  C  would  help. 

Of  course,  in  practice  we  cannot  always  define  the  region  of  interest 
or  practicability.  Then  we  have  to  make  the  best  guess  we  can  from  pre¬ 
vious  knowledge  of  the  kind  of  thing  that  happens,  or  a  preliminary 
exploratory  experiment  may  be  desirable.  In  sizing-weaving  experiments 
a  good  deal  of  prior  knowledge  is  available,  and  if  new  fibres  and  new 
sizing  substances  have  physical  properties  not  very  different  from  those 
previously  encountered,  it  is  not  difficult  to  suggest  a  suitable  range 
of  variation. 

At  best,  however,  the  range  of  interest  is  not  known  precisely,  and 
in  order  to  be  sure  of  covering  it,  the  experimental  range  should  extend 
slightly  beyond  the  presumed  range  of  interest. 

It  does  not  require  much  reflection  to  decide  that  the  experimental 
amounts  of  size  need  not  be  the  same  for  the  two  types,  nor  need  they 
cover  the  same  range.  For  example,  if  the  response  curves  were  as  shown 
in  Fig. 3.  as  they  might  easily  be,  one  would  explore  the  region  A  -  B  for 
size  I  and  C  -  D  for  size  II,  There  would  be  no  interest  in  comparing  the 
sizes  at  the  same  amount.  This  is  convenient,  for  in  most  practice  it  is 
not  possible  to  control  the  amount  of  size  on  the  warp  closely.  One  aims 
at  a  certain  amount  but  only  achieves  something  fairly  near  it  and  then, 
by  subsequent  analysis,  determines  the  actual  amount. 

Seldom  is  there  enough  information  to  justify  one  in  spacing  the 
levels  for  each  factor  at  other  than  equal,  or  nearly  equal,  intervals 
within  the  chosen  region. 


What  to  do  with  Excluded  Factors? 

Factors  that  are  excluded  from  the  experiment  can  either  be  controlled 
at  a  constant  level  or  they  can  be  allowed  to  vary  and  contribute  to  the 
random  errors.  Which  we  do  depends  on  many  circumstances. 

For  example,  two  important  factors  excluded  from  our  sizing-weaving 
experiment  are  the  relative  humidity  of  the  atmosphere  in  the  weave  room 
and  the  loom  settings.  An  up-to-date  mill  will  have  the  relative  humidity 
controlled  at  a  certain  level,  and  results  applicable  to  that  level  will 
be  appropriate;  relative  humidity  there  will  be  controlled,  A  less  up-to- 


Design  of  Experiments 


279 


date  mill  may  not  have  such  control,  and  then  the  relative  humidity- 
should  be  allowed  to  vary  over  the  range  normally  experienced.  This  may 
present  difficulties  since  relative  humidity  may  have  a  seasonal  fluctua¬ 
tion  and  an  experiment  extending  over  a  substantial  part  of  a  year  may  be 
unduly  burdensome  and  protracted. 

Loom  settings,  on  the  other  hand,  vary  somewhat  from  loom  to  locaa  in 
most  mills,  variations  being  associated  vriLth  loom  overseer.  These  varia¬ 
tions  must  be  covered  by  the  experiment  and  contribute  to  the  random 
errors . 

So  far  I  have  discussed  these  issues  from  the  view-point  of  the 
indi-vidual  mill  seeking  empirical  information  for  local  application,  A 
research  institute  serving  an  industry  will  have  a  wider  interest  and, 
logically,  should  cover  the  full  range  of  conditions  that  occur  in  many 
mills.  If  the  experiments  are  done  in  mills  but  are  under  the  control 
of  the  institute,  it  will  be  practicable  only  to  treat  each  mill  experi¬ 
ment  independently  as  though  the  work  were  being  done  for  an  individual 
mill,  and  to  generalise  as  the  results  for  different  mills  accumulate. 
When  the  experiments  are  done  in  experimental  workrooms  the  experimenter 
vri.ll  usually  prefer  to  control  all  the  excluded  variables  each  at  one 
level,  as  far  as  possible.  Then  he  vdll  have  sound  results  for  a  defined 
set  of  conditions,  which  can  be  built  into  a  rising  edifice  of  knowledge. 
Any  generalisations  that  it  may  be  expedient  to  make  at  any  time  will  be 
the  result  of  speculation  guided  by  such  knowledge  as  is  available. 

Sometimes  it  is  not  easy  to  say  what  is  meant  by  constancy  of  a 
factor.  For  example,  consider  a  size  mixing  containing  an  adhesive  such 
as  starch,  and  a  lubricant  such  as  tallow.  The  effect  of  the  lubricant, 
although  not  \iniraportant ,  is  secondary  to  that  of  the  adhesive,  and  often 
the  amount  and  type  of  adhesive  is  investigated,  the  lubricant  being  kept 
"constant"  in  type  and  amount  as  an  excluded  factor.  What  is  constancy 
here?  Would  one  keep  constant  the  absolute  weight  of  lubricant  per  100  lb 
of  yarn  sized  or  the  weight  of  lubricant  relative  to  that  of  the  adhesive? 
The  only  sure  way  of  answering  this  question  is  to  do  a  factorial  experi¬ 
ment,  In  the  meantime,  the  usual  -view  is  that  the  lubricant  lubricates 
the  size  rather  than  the  yarn,  and  the  weight  relative  to  the  amount  of 
size  is  the  basis  adopted. 

What  Responses? 

Tvro  general  points  arise  when  deciding  for  an  experiment  what 
observations  shall  be  taken  and  what  measurements  shall  be  considered 
as  responses.  These  are  (1)  the  view  must  be  so  broad  that  all  rele¬ 
vant  effects  are  considered,  and  (2)  compromises  must  be  struck  when 
there  are  cont erecting  effects,  I  shall  illustrate  these. 

In  our  sizing-weaving  experiments,  the  people  responsible  for 
production  and  wages  are  interested  in  reducing  warp  breaks  experienced 
in  wea-ving  to  a  minimum  since  they  add  to  the  weavers*  work-load.  The 
quality  control  department  are  interested  in  cloth  quality  which,  other 
things  being  equal,  is  imporved  as  warp  breaks  are  reduced.  But  some 
sizes  might  reduce  warp  breaks  to  a  minimum  but  be  deleterious  to  other 


280 


Design  of  Experiments 


aspects  of  quality  such  as  ’’cover"  or  "cannage".  Further,  some  sizes  give 
low  warp  breakage  rates  but  may  be  difficult  to  remove  in  finishing,  and 
would  be  unacceptable  to  the  people  in  the  finishing  department.  Finally 
some  sizing  materials  are  more  expensive  than  others:  some  prepared 
starches,  for  example,  cost  twice  as  much  per  ton  as  their  natural  equiv¬ 
alents.  The  experimenter  should  take  all  these  considerations  into  account 
in  deciding  what  observations  to  take  and  how  to  appraise  the  results. 

If  all  aspects  -  warp  breaks,  cloth  quality,  and  so  on  -  can  be 
evaluated  in  terms  of  costs,  it  is  relatively  easy  to  decide  on  the  opti¬ 
mum  size.  But  such  evaluation  is  not  possible  for  all  aspects,  and  the 
technologist,  must  assess  the  different  results  qualitatively  and  use 
judgment  in  striking  the  best  compromise.  He  will  probably  choose  from 
the  sizes  that  give  nearly  the  lowest  warp  breakage  rate  those  that  are 
satisfactorily  removed  in  finishing,  and  if  there  is  further  room  for 
choice  of  these  he  will  select  those  that  give  the  best  cloth  quality. 

Sometimes  the  statistical  analysis  is  facilitated  by  mathematically 
transforming  the  variable  in  which  a  response  is  measured  -  by  analysing 
the  square  root  of  the  breakage  rate,  for  example,  I  have  done  this  sort 
of  thing  but  am  not  sure  that  such  action  is  not  sometimes  an  exercise  in 
statistical  virtuosity  rather  than  a  good  thing  to  do.  In  any  event  we 
must  remember  that  the  final  report  has  to  be  made  to  a  technologist  and 
all  figures  must  be  given  in  terms  that  mean  something  to  him.  He  can 
interpret  a  warp  breakage  rate  but  not  its  square  root. 


What  Experimental  Plan? 

The  experimental  plans  or  designs  now  available  are  many  more  than 
the  simple  randomised  blocks  and  Latin  squares  which  held  the  field  in  the 
early  days  of  the  subject.  In  the  experiments  I  am  discussing,  however, 
experimental  treatments  are  few  and  sia^^le  plans  are  appropriate. 

The  natural  experimental  unit  in  a  sizing-weaving  experiment  is  a 
warp,  containing  yam  to  make  several  hundred  yards  of  cloth,  sized  with 
one  size  throughout.  This  will  go  independently  into  one  loom  in  the 
weave  room  and  will  take  between  two  and  six  weeks  to  weave.  At  the 
sizing  process  warps  are  produced  one  at  a  time  successively  from  the 
machine,  each  taking  perhaps  an  hour  to  run;  and  thirty  or  so  warps,  re¬ 
quiring  three  or  four  shifts  to  run,  form  a  set.  Sizing  is  an  almost 
continuous  process,  with  only  very  short  stops  at  the  end  of  each  warp 
and  somevfhat  larger  stops  at  the  end  of  each  set.  The  capacity  of  the 
size  tank  is  considerable,  so  that  it  is  quite  a  business  to  change  the 
size  in  type,  although  the  amount  put  on  the  warp  can  somewhat  more  eas¬ 
ily  be  changed  in  a  downwards  direction  by  adding  water  in  the  "sow  box" 

(  which  contains  the  size  actually  in  the  process) . 

The  problem  is  to  superimpose  on  this  industrial  set-up  an  experiment 
so  that  production  is  not  interfered  with  unduly.  Suppose  that  there  are 
tvro  types  of  size  I  and  II,  and  four  amounts  of  each,  leading  to  eight 
treatments,  say  II,  12,  13,  III,  II2,  II3,  II4»  and  that  there  are  four 
warps  for  each  treatment. 


Design  of  Experiments 


281 


The  statistician's  ideal  would  probably  be  an  arrangement  with  four 
blocks  each  of  eight  consecutive  v/arps,  and  the  treatments  distributed  at 
random  within  each  block.  This  would  be  intolerable  to  the  mill  since  it 
\irould  involve  changing  the  size  at  the  end  of  each  warp.  The  most  that  the 
mill  is  likely  to  tolerate  is  a  new  type  of  size  for  each  of  four  shifts, 
^■fith  the  amounts  of  size  being  successively  reduced  within  each,  so  that 
the  arrangement  would  be: 


Shift 

A: 

11 

11 

12 

12 

13 

13 

14 

14 

Shift 

B: 

III 

III 

II2 

II2 

II3 

II3 

II4 

II4 

Shift 

C: 

III 

III 

II2 

II2 

II3 

II3 

II4 

II4 

Shift 

D: 

11 

11 

12 

12 

13 

13 

14 

14 

Then  each  pair  of  shifts  would  form  an  independent  sub-experiment  with 
two  replicate  warps  for  each  treatment.  The  arrangement  violates  the 
canons  of  sound  experimentation  since  the  treatments  are  not  distributed 
at  random  (they  are  in  order  of  decreasing  amount  of  size) ,  and  the  repli¬ 
cates  are  consecutive.  But  this,  or  something  like  it,  is  the  best  that 
the  mill  is  likely  to  tolerate. 

When  the  warps  are  produced  one  would  like  them  to  go  into  the  weave 
room  according  to  some  pattern,  but  it  is  likely  that  they  will  have  to  go 
into  the  looms  as  they  become  vacant,  the  warps  being  chosen  at  random  only 
if  several  are  available  when  one  is  called  for.  However,  this  arrangement 
is  likely  to  be  substantially  a  random  one. 

An  experiment  of  this  sort  is  not  valueless,  even  though  it  is  not 
entirely  satisfactory.  The  effect  of  amount  of  size  cannot  be  disen¬ 
tangled  frcxn  that  of  the  order  of  sizing,  but  the  order  effects  are 
unlikely  to  be  the  same  for  the  two  sub-experiments  and,  with  care, 
should  be  small.  Moreover,  the  variance  between  replicates  within  the 
same  sub-experiment  can  be  conpared  with  that  between  sub-experiments 
to  show  whether  there  is  a  substantial  position  effect,  I  think  that 
technologists,  with  the  background  knowledge  they  possess,  will  easily 
reach  useful  conclusions  from  the  results  of  such  an  experiment, 

I  think  that  when  experiments  are  superimposed  on  normal  factory 
production,  it  will  usually  be  advisable  to  have  two  or  three  small 
independent  sub-experiments  and  to  make  the  arrangement  within  each  sub¬ 
experiment  simple  to  operate,  introducing  such  randomisation  and  subtleties 
of  arrangement  as  are  expedient,  but  not  vrorrying  overmuch  if  the  arrange¬ 
ment  is  more  systematic  than  a  statistican  would  like,  A  former  colleague, 
Mr.  R.E.  Peake,  describes  an  experiment  in  a  spinning  mill  (Applied  Statis¬ 
tics,  2,  1953,  PP  18 4-192)  in  which  a  Latin  square  arrangement  would  have 
been  appropriate  were  it  not  that  that  would  have  involved  a  certain  group 
of  machines  working  continuously  for  several  weeks  on  the  same  product. 

This  condition  could  not  be  ensured  and  the  experiment  had  to  be  divided 
into  independent  sub-experiments,  each  lasting  about  a  week,  V/ithin  each 
sub-experiment  a  random  arrangement  of  treatments  was  feasible,  so  that 
the  whole  formed  a  randomised  block  experiment. 


282 


Design  of  Experiments 


V/hen  an  experiment  is  done  at  a  research  station  or  institute, 
complication  in  the  arrangement  is  practicable  and  may  be  desirable. 

For  example,  in  a  sizing-weaving  experiment,  we  may  have  four  warps  each 
with  a  different  size,  and  each  divided  into  four  sub-warps.  These  can 
be  woven  simultaneously  in  four  looms  and  at  the  end  of  each  sub-warp  the 
warps  can  be  interchanged  between  looms  on  a  Latin  square  plan.  Then  in 
the  analysis,  loom  effects,  which  can  be  quite  substantial  and  contribute 
to  the  errors  in  the  above  plans  for  factory  use,  are  eliminated  from  the 
comparisons  between  warps.  Adequate  replication  of  the  sizing  can  be 
achieved  by  having  two  or  more  sets  of  four  warps.  If  there  are  eight 
treatments,  an  8  x  4  plan  may  be  used. 


What  Size  of  Experiment? 

In  principle  two  things  are  required  to  decide  the  size  of  experiment: 
the  precision  with  which  the  response  is  to  be  determined,  and  the  extent 
and  pattern  of  error  variations  likely  to  be  encountered.  On  the  second, 
a  good  deal  is  known  for  sizing  and  weaving  experiments  as  conducted  in 
Lancashire,  E.  Bradbury  and  Mr.  H.  Hacking  have  dealt  with  experiments 

in  factories  (Journal  of  the  Textile  Institute.  40.  1949,  pp  P532-P551)  and 
Mr.  V.R.  Main,  and  I  have  dealt  with  experiments  as  conducted  at  the  Shirley 
Institute  (Journal  of  the  Textile  Institute.  32,  1941,  PP  T209-T220) ,  I 
do  not  think  that  it  is  necessary  to  do  more  than  make  crude  estimates 
from  standard  errors  calculated  on  large-sample  theory. 

In  many  experiments  in  textiles  we  are  interested  in  the  rate  at  which 
the  yam  breaks  in  processing,  or  in  the  incidence  in  time  or  space  of 
various  defects,  and  these  are  chance  events  distributed  more  or  less  at 
random,  i.e.  more  or  less  according  to  the  Poisson  or  negative  exponential 
law.  This  is  convenient  because  it  enables  us  to  calculate  in  advance  how 
large  an  experiment  needs  to  be.  In  practice,  there  are  other  uncontrolled 
variations  superimposed  on  the  chance  variations  so  that  the  size  of  exper¬ 
iment  so  determined  is  too  small.  Nevertheless  the  calculations  are  useful 
in  showing  roughly  the  scale  of  experimentation  required,  and  in  setting  a 
lower  limit.  It  is  disconcerting  to  many  experimenters  and  practical  men 
to  find  that  the  necessary  scale  is  much  larger  than  anything  that  they  had 
previously  contemplated,  and  that  carefully  controlled,  small-scale  experi¬ 
ments,  perhaps  with  the  warp  yarns  to  be  compared  woven  side  by  side  in 
strips  in  the  same  loom,  do  not  suffice.  Such  arrangements  cannot  reduce 
the  purely  chance  variations. 

If  no  prior  data  are  available  it  is  necessary  to-  proceed  in  a  sequential 
way,  starting  with  a  fairly  small  experiment,  examining  the  results,  and  then 
extending  the  experiment  stage  by  stage  until  adequate  precision  is  attained. 


Execution  of  Experim.ent  and  Collection  of  Results 

It  is  axiomatic  that  after  an  experiment  has  been  planned,  the  specified 
procedures  and  conditions  should  be  closely  adhered  to  and  the  data  should 
be  correctly  recorded.  These  things  are  for  the  technologist  or  experimenter 
rather  than  the  statistician,  and  are  apt  to  be  taken  too  much  for  granted. 


Design  of  Experiments 


283 


Good  statistical  design  is  not  a  substitute  for  careful  experimental 
control  --  it  is  complementary.  In  our  weaving  experiments  at  the  Shirley 
Institute,  we  have  found  that  by  unremitting  attention  to  detail,  the 
precision  of  the  results  has  been  improved  enormously.  This  is  not  a 
matter  about  which  I  can  find  much  to  say,  but  I  do  enphasize  its 
importance. 


Analysis  of  Results 

The  standard  statistical  procedure  for  treating  the  results  of  an 
experiment  is  to  cinalyse  the  variance  and  test  the  significance  of  the 
various  effects.  This  is  always  a  good  thing  to  do  in  order  to  restrain 
the  ever-optimistic  experimenter  from  reading  into  the  results  more  infor¬ 
mation  than  is  there.  And  the  comparison  of  the  error  variance  for  a 
particular  experiment  with  error  variances  'commonly  experienced  provides 
a  check  that  the  control  has  been  good. 

But  the  main  scientific  or  technological  interest  lies  in  measuring 
the  response  for  different  values  of  the  variable  -  in  measuring,  for 
example,  the  relationship  between  amount  and  type  of  size,  and  the  mean 
warp  breakage  rate.  For  this,  the  plotting  of  graphs  in  the  usual  way 
provides  a  great  help,  and  we  have  found  that  if  a  simple  experiment  is 
well  planned  and  carried  out,  the  technologist  can  interpret  the  results 
without  recourse  to  recondite  statistical  methods.  Statistical  principles 
find  their  most  inportant  application  in  the  planning  stage  rather  than 
in  the  stage  of  analysis  of  results. 


AMOUNT  OF  SIZE 


